Weasyl / weasyl

The website.
https://www.weasyl.com
Apache License 2.0
117 stars 30 forks source link

API keys should not be stored in the database unencrypted #235

Open hyena opened 7 years ago

hyena commented 7 years ago

These are effectively passwords and should be treated as such. We should hash/verify them the same way. Presently there's a security risk insofar as a database compromise would compromise all API keys.

Fixing this properly will be an irreversible migration. As such we need to resolve the pending database alembic schema issues first.

hyena commented 7 years ago

n.b. This may be a won't fix because it would remove the ability to list API keys on the site after their creation.

kfkitsune commented 6 years ago

So I've been thinking. There is a way to do this correctly, but it would need an API change.

Right now middleware.py does a direct comparison between the value in the X-Weasyl-API-Key header, and then attempting to pull the userid that corresponds to the token. As below:

api_token = request.headers.get('X_WEASYL_API_KEY')
    authorization = request.headers.get('AUTHORIZATION')
    if api_token is not None:
        # TODO: If reification of userid becomes an issue (e.g. because of userid changing after sign-in) revisit this.
        # It's possible that we don't need to reify the entire property, but just cache the result of this query in a
        # cache on arguments inner function.
        userid = d.engine.scalar("SELECT userid FROM api_tokens WHERE token = %(token)s", token=api_token)
        if not userid:
            raise HTTPUnauthorized(www_authenticate=('Weasyl-API-Key', 'realm="Weasyl"'))
        return userid

So since the token currently needs to match the DB record, it needs to match what is presented (or the SQL query wouldn't return correctly.

As such, we'd need to alter the api_tokens table to be something like:

api_tokens = Table(
    'api_tokens', metadata,
    Column('userid', Integer(), primary_key=True, nullable=False),
    # Column representing a bcrypt hash of a valid API key.
    Column('token_bcrypt_hash', String(length=100), nullable=False),
    # Column representing the KDF-encrypted Fernet token (which can be decrypted later to be return the plaintext which corresponds to the bcrypt hash
    Column('fernet_token', String(length=420), nullable=False),
    # Column representing the unique identifier of the desired API key (to permit an identifier header to pull out the bcrypt hash.
    Column('token_identifier', String(length=100), primary_key=True, nullable=False),
    Column('description', String()),
    default_fkey(['userid'], ['login.userid'], name='api_tokens_userid_fkey'),
)

In doing so, we'd be able to accomplish the following:

However, the API would need to be modified in the following way

Effectively, this transitions the code to the following:

api_token = request.headers.get('X_WEASYL_API_KEY')
api_token_identifier = request.headers.get('X_WEASYL_API_KEY_IDENTIFIER')
    authorization = request.headers.get('AUTHORIZATION')
    if api_token is not None:
        # TODO: If reification of userid becomes an issue (e.g. because of userid changing after sign-in) revisit this.
        # It's possible that we don't need to reify the entire property, but just cache the result of this query in a
        # cache on arguments inner function.
        userid, token_bcrypt_hash  = d.engine.execute("""
            SELECT userid, token_bcrypt_hash
            FROM api_tokens 
            WHERE token_identifier = %(token_identifier)s
        """, token_identifier=api_token_identifier ).first()
        if not bcrypt.checkpw(api_token.encode('utf-8'), token_bcrypt_hash):
            # If the bcrypt hash doesn't match, authentication fails;
            raise HTTPUnauthorized(www_authenticate=('Weasyl-API-Key', 'realm="Weasyl"'))
        return userid

I haven't run the speed calculations yet, but since bcrypt does take time to hash, we could drop the rounds to something lower to speed up the access (4 is the lowest value pyca/bcrypt can use for the work factor, though we could use a slightly higher work factor, depending on the speed).

As of right now, these are my current thoughts on this, if we don't want to store the tokens plaintext in the DB. There may be better options, but this is where my current thoughts are on this.

kfkitsune commented 6 years ago

And now that I think about this further, in situations where the user's password needs to be forcibly changed (e.g., forgotten password recovery), that necessarily invalidates the KDF stored data. So a forgotten password would invalidate viewing the stored API tokens.

Similarly, if I was to extend the KDF encryption to 2FA tokens, a forgotten password would block 2FA from succeeding, thus blocking the user out entirely. I was in the process of creating a branch to not have to have a symmetric Fernet key for each user's 2FA secret; it would have worked, right up until the point where the user forgot their password. As such, in that situation 2FA would either block the user's account, or fail back to 1FA (possession of the email account).

I'm not entirely sure how to securely bind the encryption information on a per-user basis, while not storing it in the DB. The alternative is to use another Fernet encryption key to do symmetric encryption against the API tokens. Mind, that shifts the key storage to disk, which isn't necessarily ideal.

As a note, however: GitHub treats personal access tokens like passwords it seems, and it doesn't display the value of the token once it is generated. As such, on the surface it seems like they may be taking a token (such as 5WWc6ep9cMiBUosktnmkc9M5YHCeVd (just a random string)), then hashing it on the backend before storing it. So from the standpoint of, "What has been done before?" then we could be justified in hashing the tokens, and only showing the description for the token (optionally showing the first/last 5 characters for user convenience purposes, if desired).

charmander commented 6 years ago

You don’t need to use bcrypt – the keys are long and random. I’d like to move away from keys that grant complete account access altogether, though…

BusterNeece commented 5 years ago

Considering the increasing spate of credential leaks happening around the web (with many affecting the furry community in particular), multiplied by the somewhat increased risk imposed by the source code for Weasyl (and this issue) being public, I would recommend expediting the resolution of this issue.

If an API key making an authenticated call to the Weasyl API is capable of conducting all of the same actions as an authenticated user, then it should be treated as a password: displayed once to the user with a strong notice that it won't be visible again, then hashed into the database using modern best practices, which include splitting the token into a "selector" and "verifier" to avoid database timing attacks. These split tokens could still easily be presented to the user as a single API key with a delimiting character, then processed as such on the backend.

While implementing ParagonIE's suggested API signing mechanism may indeed be overkill for this use-case, it's another route that could be taken to help harden this surface area.

charmander commented 5 years ago

@SlvrEagle23 A timing attack on the current plan isn’t a vulnerability (it extracts the hash of a value that can’t be brute-forced). But we also need to replace this whole thing with OAuth. =/