doorkeeper-gem / doorkeeper

Doorkeeper is an OAuth 2 provider for Ruby on Rails / Grape.
https://doorkeeper.gitbook.io/guides/
MIT License
5.32k stars 1.07k forks source link

Let's encrypt credentials as ask by the RFC #675

Closed mtparet closed 5 years ago

mtparet commented 9 years ago

As specified by the rfc http://tools.ietf.org/html/rfc6819#section-5.1.4.1.3 we should not store the client_secret in clear text. (the client_secret is like a password)

tute commented 9 years ago

Absolutely, thanks for bringing this up. It's good we didn't release 3 yet.

Let's do it soon so we release 3.0.0.rc2, and a more secure v3. Can you start a PR, @mtparet?

mtparet commented 9 years ago

Do you think that during the transition we should accept both forms, or migrate the existing data?

As the v3 is a new major version, it's a good time to introduce this breaking change and migrate the existing data. We should release a new v2 version with warnings about client_secret not available anymore. The migration should be done using a safe way. (First, adding a new database field and deprecating the old one and in a next release removing the old field)

Would BCrypt be the best algorithm for doing so?

It seems everybody is recommending it so yes unless someone has a better proposition.

Can you start a PR, @mtparet?

I can't give a "yes sure" but I will try to find time for this.

adstage-david commented 8 years ago

@tute / @mtparet - my company wants this feature enough for me to take a crack at it. Was any progress made here or at least decisions made? Seems we should also treat tokens/refresh_tokens stored in the database similarly (as per #320)? I don't think bcrypt was the recommendation shaking out of the other thread - specifically because bcrypt is designed to be slow to prevent brute-forcing - would we want different algorithms between the two, or to use the same approach for all of the secrets?

tute commented 8 years ago

my company wants this feature enough for me to take a crack at it.

:clap:

Was any progress made here or at least decisions made?

There was no progress made on this (AFAIK). Any progress is needed and welcomed. In my ideal world, doorkeeper would not store grants or tokens altogether, and implement a solution along the lines of what @ahacking described in https://github.com/doorkeeper-gem/doorkeeper/issues/320#issuecomment-41063566.

Do you think you can work on that approach?

(looping in @diegode)

adstage-david commented 8 years ago

Okay - when we say "not store grants or tokens", we might still store an HMAC of the issued tokens to tie back later revocations, right? I imagine grants could be completely ephemeral - you don't need to be able to revoke them as far as I'm aware, especially since they're short lived (unless I'm missing something about the flows - we only use the "access grant" table to store the codes in the authorization code grant type, correct?)

If access grants are indeed short-lived/don't need revoked, my approach would be the following:

  1. AccessGrant: no longer stored at all, they are generated and signed, then forgotten. When exchanging for a full access token, we can validate the grant, check the client_secret and then give out the token.
  2. AccessToken: only HMAC of token stored in database to handle revocation - tokens can be validated by signature, and then we can look up HMAC in database to verify they have not been revoked
  3. Application: HMAC of client_secret+client_id stored for later lookup

For points 1 and 2 - a standard immediately comes to mind we could use here, rather than rolling our own solution (which generally I think I prefer): JWT. If JWT was a first class citizen in Doorkeeper, we might do the following:

  1. AccessGrant: generate JWT of the grant (with resource owner id, client id, etc.). Validate the JWT when exchanging for code
  2. AccessToken: generate a JWT token for the user, but store only the signature part of the token - validation of token can be done without database lookup, and then we can look up the signature in the table to check if it has been revoked to support http://tools.ietf.org/html/rfc7009
  3. Application: JWT would potentially not be as applicable here, but if we wanted a single way to do things we could generate a JWT of the client secret + client id and initial issuing time and then store the signature in the same way as AccessTokens. Along those lines - if we just treated the client secret as another kind of AccessToken we could validate and store like we store any other access token, potentially addressing #659 and having a single code path to verify both client secrets and access tokens?

The main downside with the JWT approach that I can see is just that tokens might become (theoretically) larger strings, which could be potentially problematic. If that's a concern we can stick with randomly generated (relatively short) strings as tokens and do the HMAC approach without introducing JWT, in which case I would probably use the same base algorithm JWT requires - HMAC SHA-256. Would we want to support multiple algorithms (getting back to not rolling our own too much and using built in JWT support)?

A danger that must be addressed going with the JWT approach is algorithm switching - https://auth0.com/blog/2015/03/31/critical-vulnerabilities-in-json-web-token-libraries/, so keeping that in mind would be important.

Thoughts? Personally I lean toward preferring JWT library to do the signing unless we consider the change in token size to be concerning. It allows us to lean more on already existing and (hopefully) tested work generating signatures on tokens, and also means that a choice of algorithm for end users would be easier to support if stronger or asymmetric encryption was desired.

adstage-david commented 8 years ago

One thing my proposal does not address - refresh tokens. To avoid storage of refresh tokens might be a bit more challenging to think about. Taking the JWT approach - they could be separately signed and unstored JWTs, but revoking refresh tokens is more important than access tokens in the presence of both, so we probably need storage. Taking a similar approach to the access tokens themselves, we could store refresh tokens in the same table and allow revoking both types.

For the identifiers of JWT we can look up in the database - looks like "jti" claim is the potential right approach, that we could use instead of storing the signature portion of the JWT. jti is intended to be a unique token identifier, which we could use specifically for revoking: https://auth0.com/blog/2015/03/10/blacklist-json-web-token-api-keys/.

adstage-david commented 8 years ago

Another sizable downside to the JWT approach is migration. I would propose a two step migration if we went the JWT route (my company has some long-lived token users we're going to need to push to the new token style with minimal disruption):

  1. A version that generates JWT tokens, but allows fallback to the stored tokens, allowing natural in-place upgrades as tokens expire
  2. A version that drops plaintext tokens and relies solely on the jti value that can be run after enough upgrades have taken place (or been forced).

This route sounds complicated and might be enough to warrant steering clear of the JWT route. If we go the route where we just store signatures of the plaintext tokens, we can update in place without any tokens going invalid, so the upgrade strategy is likely going to be much more smooth - all currently valid tokens remain valid. Then again, all tokens that have been exposed as plaintext in the database you might not want to keep valid - we could be papering over security holes with that approach.

jcbpl commented 8 years ago

I’d also like to help out with this if I may :smile:

I like the sound of the ephemeral JWT approach for access grants (authorization codes), and JWT/JTI for access and refresh tokens. I think it would be wise to persist the JTI for all issued/unexpired tokens (as opposed to just revoked tokens), because being able to list active tokens is needed to allow users to revoke a specific client’s access from the provider end (without knowing the specific token). That’s also most similar to the current implementation.

I don’t think I see a problem with maintaining backwards-compatability when presented with a non-JWT token; it seems like whether and when to force an upgrade to JWT tokens should be at each provider’s discretion. An existing implementation with a sane expiry/refresh config would probably see most of its tokens exchanged for JWTs fairly quickly.

Taking a similar approach to the access tokens themselves, we could store refresh tokens in the same table and allow revoking both types.

We’d probably want to store the refresh token’s JTI in the same row as the access token (like the plaintext token is now), instead of separately, so that revoking an access token also revokes the refresh token?

I’m going to try to learn more about the possible necessity of being able to revoke access grants, cause I don’t have a clear handle on that right now.

I don’t think the length of the JWTs would be a problem; as long as the JTIs are less than 255 chars I don’t think it would even require modifying the existing tables.

adstage-david commented 8 years ago

Maintaining backwards compatibility for us is important because our setup has been a little less than sane - we haven't actually exposed setup for oauth applications to end users, but have created tokens on their behalf, so we'll need to help them get new tokens without breaking anything. I figured there might be others like me that need to work towards a sane approach. :)

I'm still not to the point of it working at all, but I'll push up the branch with my proposed migrations for the end state of the tables so we can see how we feel about the schema at least.

adstage-david commented 8 years ago

The way I'm thinking the DB ends up is like this:

https://github.com/adstage/doorkeeper/commit/8b1ab66d4769e2fae28b03806e47eb1ed4ccdb5e

Note that client secrets are treated as separate JWTs in this approach, where the JTI value is the client id and the client secret is the full JWT. This allows for client secret rotation, which I believe might become more important when you can't retrieve the original secret, but that is not necessarily required for this PR though.

jcbpl commented 8 years ago

I’m less certain of the advantage of JWTs for the client secret. Disallowing application owners from retrieving the original secret seems like a significant departure from the current implementation. Would it be enough to encrypt the secret at rest for now? Or would it be sensible to recreate the secret from the JTI?

With regard to the latter option, Doorkeeper currently implements AccessToken.find_or_create_for; would preserving that behavior require recreating access tokens JWTs from the JTI, too?

adstage-david commented 8 years ago

I'm okay with encrypting the client secret at rest rather than coercing it into a JWT, I was just hoping to kill two birds with one stone here and have a single way to sign and encrypt values.

Not allowing secret retrieval is a bit weird and not necessarily required, as near as I can tell even big players like Facebook and Google still support retrieving the secret later on - so I'm thinking just encrypting at rest is probably the way to go for client_secrets. I was trying to meet @tute's ideal for all secrets, but it's not necessarily required in this case/might be overboard:

In my ideal world, doorkeeper would not store grants or tokens altogether, and implement a solution along the lines of what @ahacking described in #320 (comment).

adstage-david commented 8 years ago

Also, on the #find_or_create_for question - it appears to be desired functionality from https://github.com/doorkeeper-gem/doorkeeper/pull/387, so we'd probably want to keep some form of that feature around? A few approaches there:

  1. Generate new JWT with new iat claim and same jti/exp claims. This lets the token returned not have to be an exact match, but still act as a single token for revocation purposes. I'm a little fuzzy on whether this would be within the spirit of the jti claim for the JWT rfc, but seems like it's a reasonably valid option?
  2. Use a reproducible process to regenerate the same JWT again in #find_or_create_for, making sure we return the same token again - this holds little value over #1 as far as I can tell, and might still not necessarily return the same token?
  3. Separate revoked tokens table (maybe as an option), only store jtis in the revocation case, making it unnecessary to store tokens generally and addressing the spirit of what https://github.com/doorkeeper-gem/doorkeeper/pull/387 was built to work around - a constantly growing without bounds oauth tokens table. You lose the ability to list provisioned tokens, but gain a lot of storage space if you're talking about millions of tokens and are worried about the size of millions of tokens in the database.
  4. No longer support the option of generating similar tokens to save database size, get fresh jti claims per token request, allowing each to be revoked separately. Note that the only requirement for JTI claims is that they are collision resistant, and since we're generating them, they could just as easily be autoincrementing counters of some sort, allowing smaller size than the existing tokens being stored in the database, perhaps saving enough disk space to make the feature unnecessary?
adstage-david commented 8 years ago

Schema where we keep JWT for grants and access tokens but separately encrypt client secrets at rest: https://github.com/AdStage/doorkeeper/commit/a36ae6b04 a4ca7bf6a35f1b161613ba3c748e22e

jcbpl commented 8 years ago

I can definitely see the case for not storing grants or tokens at all, but I think there also are legitimate reasons to want to store them. Since that’s what the gem does now, the path of least resistance seems like it would be maintaining it. One potential non-breaking way to introduce that could be as an option or no-op ORM adapter; moving to a format like JWT would make either feasible.

Reusing access tokens also seems to be desired as recently as #661, so my vote is probably that whatever we do now preserves that according to your 1st or 2nd option.

Reading @ahacking’s comment on #320 again, one shortcoming of JWT that stands out is the fact that the payload is not encrypted. To add that, I think we’d need to use something like Fernet. It doesn’t have the same transparent or defined schema as JWT, but it’s a better alternative if the access grant payload is more sensitive.

So I wonder if the best approach might be to represent tokens through an adapter that dictates which part of the token is stored and how it’s verified. That way we could preserve the advantages of an interoperable token format without having to pick just one; it also seems to be in the spirit of the existing "access_token_generator" setting.

If we go that route, dealing separately with encrypting the client secret at rest might make the most sense.

ahacking commented 8 years ago

I would also be careful in the design about separation of token from other related information relied upon in access decisions.

A solid design would ensure all related information at rest that is relied upon is bound cryptographically and authenticated using a MAC derived from a secret known only to the server.

Modifying any of the token data or related meta data at rest should not go undetected; it should not be possible to inject tokens or change grants, expiry, client id, user id, app id etc.

jcbpl commented 8 years ago

Thanks. I think to achieve that and still support reusing access tokens we'll need to store the signature of the JWT instead of JTI, and then verify it when we reissue the access token.

adstage-david commented 8 years ago

@ahacking - that's one of the reasons I like the JWT approach - scopes, expiry, client id can all be encoded and signed in a verifiable way and not even have to be stored in the database if desired. The database storage of that information would be merely informational for showing to end user if we want to show what "apps" have access to their info (+when, and what scope), but the token would be signed with the secret known by the server and we'd rely on the info within the verified token to actually check permissions.

We don't want to store the signature to prevent tampering @jakepaul, as long as we only rely on signed data within the JWT itself, we can confirm that no tampering occurred. (Which means we'd end up needing to store scope, expiration and application id in there). If we store the signature, then we could end up in a case where we're essentially leaking tokens again, or we need to store less information about the token, potentially limiting the usefulness of "authorized_applications" page. To see this, consider two cases:

JWT signature known by DB

header of JWT is known plaintext payload of JWT is unknown-ish signature of JWT is known (because we store it and attacker found it)

For raw JWT, the only part of the token the attacker needs at this point is to figure out to impersonate the user is the right plaintext payload to generate the signature. If this is based on claims like application id, scope, and expire time, which are also stored within the database, it becomes pretty trivial to generate most of the plaintext, and the only viable field to avoid this becoming a leak of the full text of the token is jti - which now has to be a cryptographically secure random value that is about the same as access_token as far as brute-forcing goes, and can't be stored anywhere on the server.

Note that doing it this way means that there is no practical way to reuse tokens within the DB as we were discussing previously - you'd have to generate an exactly equivalent signature without knowing the jti, which we've established you can't know without exposing the token in the DB dump.

Storing JTI

If we go with just storing jti: header of JWT is known plaintext payload of JWT is known signature is unknown

Since the signature can only be determined with the secret, we play to the strengths of the HMAC algorithm which is meant to guarantee exactly this scenario - no tampering of a payload in the case that the plaintext is known to all parties, including the attacker who wants to generate new or modify existing messages, and no way to impersonate a user after a database leak because the attacker cannot generate a valid signature without the secret.

Note that with either approach, we must only rely on data that is within the signed block of the JWT for actual authentication purposes, anything else stored in the database becoming merely informational for the purposes of helping to mark a JTI as invalid.

As far as I can tell, either solution is still vulnerable to an attacker with database access being able to un-revoke, or forcibly mass revoke tokens, though I think these attacks are less risky (in case of un-revoke, the token must be known, and if it has a good expiration, will only be known for so long). If the vector is a concern, we could sign JTI + revoke time (including nil), to enforce validity of revocations?

adstage-david commented 8 years ago

Note that if payload transparency is not a desirable feature, we could investigate JWE - another point for adapter based approach. One thing with adapter based approach that comes to mind though - I think what you want to store in the db generally going to be pretty closely tied to how you want to generate tokens, might not make sense to have ORM and token validation/generation be two separate adapters? JWT without revocation could have an adapter with no ORM at all, for example.

ahacking commented 8 years ago

I certainly like the use of JWT as a token format, don't see any reason not to go with that approach given its RFC status.

So the remaining crux is what knoweldge the server needs to maintain to support revocation and listing authorised apps.

The fast past of verifying a token doesn't need the db for anything but revocation checking. Some clever approaches could be used here to avoid dB access such as false positive hash table cache of truncated signatures or the JTI (which is effectively a nonce value in the JWT used to prevent replay), if there is a positive match it could be false and only then is a db access used to determine if the token really is revoked.

The remainder of what to store is how much to keep for 'authorised apps page' and this is a slippery slope and everyone's use case will vary regarding how much info to keep for display/presentation purposes. It probably is a case of store the entire JWT and any meta data such as its revocation status encrypted and MAC'd with a server secret.

Detecting removal of tokens is a trickier problem and needs a hash of the full token table. This could use a kind of block chain or merkle tree so that omission of any token can be detected but you will need to be careful with concurrency.

Detecting unauthorised removal or modification of tokens and meta data wouldn't be in the fast path but more of a startup sanity /verification check and a separate/periodic audit function. There are numerous approaches to the integrity mechanism that will have different storage and performance tradeoffs.

If doorkeeper wants to be agnostic to using JWT or some other method then the design will be far more complex with adapters and mechanisms and varied database table structures per mechanism.

It seems doorkeeper would be best focused on JWT and get agreement that JWT is the way forward. Migrating from old world to new world would require a token reissuance process but I'd avoid trying to support both schemes because of the complexity.

adstage-david commented 8 years ago

The remainder of what to store is how much to keep for 'authorised apps page' and this is a slippery slope and everyone's use case will vary regarding how much info to keep for display/presentation purposes. It probably is a case of store the entire JWT and any meta data such as its revocation status encrypted and MAC'd with a server secret.

Do we think it's legitimately dangerous to store the token metadata info here unencrypted, or just precautionary? I don't really see a point in encrypting everything wholesale except as a separate requirement where basically any data leaking from the database in plaintext is not good. As long as tokens are unguessable - our worry about this attack vector is lessened significantly (we often use implicit, short-lived tokens, which means we can't really trust that the token is being used by the correct client, and unauthorized use of the API is the main damage, not reading what we have stored).

Detecting unauthorised removal or modification of tokens and meta data wouldn't be in the fast path but more of a startup sanity /verification check and a separate/periodic audit function. There are numerous approaches to the integrity mechanism that will have different storage and performance tradeoffs.

Definitely agree with this - wonder if there's a point at which this doesn't even really need to be handled by Doorkeeper itself, but is merely a consistency check against your recent database dumps to see if anything is amiss (surely someone that can delete data in the tokens table can think of more "fun" things to do?)

It seems doorkeeper would be best focused on JWT and get agreement that JWT is the way forward. Migrating from old world to new world would require a token reissuance process but I'd avoid trying to support both schemes because of the complexity.

Agreed wholeheartedly - smaller is better for this kind of thing. I still know we'll need an upgrade path at work personally, so I'd like to make it an option, but one that can be easily skipped if you have a better token reissuance process.

ahacking commented 8 years ago

The meta data doesn't necessarily need to be encrypted and some fields may be carried in columns to support search/query/indexing, eg JTI and expiry etc. but all meta data certainly should be MAC'd with the token using the servers secret key to prevent say SQL injection or other backend token injection without knowledge of the servers secret key. Generating the MAC would concatenate or combine the token and meta data, JWE could be used as the at rest format if doorkeeper decides to support that for client tokens by using a well worn code path but you want to be careful wrt to algorithm agility with a configured policy so that attackers can't select a weak mode for the at rest data. Depending on the cipher used (eg AES-GCM) both the token and meta data it would be encrypted and MAC'd in one pass.

adstage-david commented 8 years ago

Okay, attacking this piecemeal so it's not a gigantic mess to review:

Here's an initial stab at encrypted client secrets in this branch: https://github.com/doorkeeper-gem/doorkeeper/compare/master...AdStage:encrypt-application-secrets

There's a few questions to address here:

  1. Is the algorithm choice okay like this? Do we need an IV to encrypt secrets that are essentially random bytes anyway? Do we roll our own a bit with stdlib, or introduce a good lightweight secondary dependency?
  2. Where does the encryption/decryption occur - ideally it probably shouldn't be part of the application model mixin itself, right? But in the global Doorkeeper module via Doorkeeper.encrypt / Doorkeeper.decrypt also seems flawed.
  3. How do we set up a global configuration option for the tests - I made it fallback to generating a random key and saving essentially in a global so that the tests would still all pass, but I don't want to leave that in. Would we want to go through every spec that needs a secret configured and set it up in the before block?
adstage-david commented 8 years ago

Phase 2 - get in JWT for access grants - https://github.com/AdStage/doorkeeper/compare/AdStage:encrypt-application-secrets...AdStage:introduce-jwt-tokens

Notes:

  1. Do we need to support outstanding grants, or do we invalidate them all after changes deployed?
  2. Turns out we still need the AccessGrant table, because we must ensure one-time use property, so we do end up storing something, but all the important stuff is validated and signed now, and no secret is visible.
KevinColemanInc commented 8 years ago

What is the status of this branch? Is someone actively working on this?

adstage-david commented 8 years ago

I have not had a chance to work on it since the holidays - I was hoping to get some feedback from the maintainers on my above branches before continuing much further with an approach that might need to be significantly refactored.

(Maybe bump @tute / others?)

tute commented 8 years ago

What is the status of this?

I am interested in working on this, but don't have the availability. doorkeeper could be more secure, and I look forward to seeing that happening.

nbulaj commented 6 years ago

Hi @adstage-david . Do you still interested in this feature?

bpieck commented 6 years ago

I don't get the issue, why anyone would prefer jwt above bcrypt/scrypt/some-similar-hashing-algorithm here. With showing secret only on creation, hashing it with some algorithm before putting it in the database, and giving the application-owner the possibility to generate a new secret later, you even secure a security whole on the server itself on beeing the issue of someone impersonating some confidential client. And I think this is the standard approach in several similar scenarios. But most probable you already put some work in this. If still anyone would like to try the hashing-approach, I could implement it as a poc.

emilebosch commented 5 years ago

Hey all, is there any reason why we for the time being just not use something like https://github.com/jmazzi/crypt_keeper as a gem? I like it because it allows me to switch encryption encryptors. then at least we're safe for the time being while we can make our own custom encrypted for the rest i.e go for hashicorp's vault. It is actually really a bit crazy that this has been open since 2015.

masterkain commented 5 years ago
# lib/ext/doorkeeper_application.rb
require 'doorkeeper/orm/active_record/application'

class Doorkeeper::Application
  crypt_keeper :secret, encryptor: :active_support, key: Settings.app.crypt_keeper_key, salt: Settings.app.crypt_keeper_salt
end
# migration
Doorkeeper::Application.encrypt_table!
# spec
require 'rails_helper'

RSpec.describe Doorkeeper::Application, type: :model do
  context 'encryption' do
    subject { build(:oauth2_client_application) }

    it 'works' do
      # test our custom patch to doorkeeper
      expect(subject.crypt_keeper_fields).to eq([:secret])
    end
  end
end
nbulaj commented 5 years ago

Closing in favor of #320

mateuscruz commented 2 years ago

For those using Rails 7 with ActiveRecord, this monkey patch should work.


# config/initializers/doorkeeper.rb

Rails.application.config.to_prepare do
  class Doorkeeper::Application
    encrypts :secret, deterministic: true
  end
end

deterministic: true is only needed if you need to be able to query by the secret column.