MasterKale / SimpleWebAuthn

WebAuthn, Simplified. A collection of TypeScript-first libraries for simpler WebAuthn integration. Supports modern browsers, Node, Deno, and more.
https://simplewebauthn.dev
MIT License
1.39k stars 122 forks source link

Remove footgun related to random user IDs #530

Closed MasterKale closed 2 months ago

MasterKale commented 4 months ago

Describe the issue

Recent discussion over in the py_webauthn project (https://github.com/duo-labs/py_webauthn/issues/187) focused on how allowing strings for user IDs when generating registration options can encourage the use of PII since e.g. "emails are strings, so why not use that?" However, the WebAuthn spec specifically discourages this pattern and encourages RPs to use random 64 bytes as IDs:

The user handle of the user account entity. A user handle is an opaque byte sequence with a maximum size of 64 bytes, and is not meant to be displayed to the user.

Additionally, in L3, new JSON serialization types have been defined that also say user.id should be a "Base64URLString" (an alias for DOMString but trying to communicate an encoding) when calling the upcoming PublicKeyCredential.parseCreationOptionsFromJSON() method:

https://w3c.github.io/webauthn/#dom-publickeycredentialuserentityjson-id

And the new PublicKeyCredential.toJSON() method will also base64url-encode userHandle bytes to make them JSON-friendly:

https://w3c.github.io/webauthn/#dom-publickeycredential-tojson

In this project, though, I treat userID and userHandle as UTF-8 strings because it'd been my belief that this was better developer experience to keep these values human-readable.

I think I need to give up my "developer experience" crusade here, though, as it runs counter to explicit user privacy goals in WebAuthn and just follow the spec more closely here.

To achieve this I should make similar behavior changes here that correspond to the ones I made to py_webauthn in https://github.com/duo-labs/py_webauthn/pull/197 and...

  1. Make the userID argument in generateRegistrationOptions() an optional Uint8Array
  2. When userID is not specified then generate 64 random bytes and return that (with instructions to RPs to associate these bytes with whatever user account the RP's site uses internally.)
  3. Encode userHandle to base64url string instead in startAuthentication()
spendres commented 4 months ago

I am using them now for the userID. I suspect that they would fit within your proposed 64 byte Uint8Array type, but have not tested to determine if this would break my code.

Please consider using ULIDs for your userID generation given their benefits in data storage: more info

MasterKale commented 4 months ago

I still think by default it's a good idea for me to generate completely random values when userID is omitted. However in your case there's nothing about what I'm proposing that would prevent you from using ULIDs for userID, so long as you figured out a way to convert them to a Uint8Array that you're comfortable with.

rimu commented 4 months ago

so long as you figured out a way to convert them to a Uint8Array that you're comfortable with.

I have not...

I'm having some trouble with this - I let py_webauthn generate a user id from the backend (it gets encoded to "OPclcKTH6cjyjoRncpBrvKaepPz4eagbzFtOVnYCmANpUx0Vntm1lzlabOri5BF97CLNfTL440SIhbqwd459eQ") and then I'm converting that to a Uint8Array:

// Convert base64url encoded bytes into a Uint8Array. Used by passkeys.
const decodeBase64 = (encoded) => {
    return new Uint8Array(atob(encoded)
        .split('')
        .map((c) => c.charCodeAt(0)));
};

const decodeBase64Url = (input) => {
    try {
        return decodeBase64(input.replace(/-/g, '+').replace(/_/g, '/').replace(/\s/g, ''));
    }
    catch (_a) {
        throw new TypeError('The input to be decoded is not correctly encoded.');
    }
};

I get 64 elements:

64.

But then startRegistration() has an exception "TypeError: User ID was not between 1 and 64 characters".

Any tips?

MasterKale commented 4 months ago

I'm having some trouble with this - I let py_webauthn generate a user id from the backend (it gets encoded to "OPclcKTH6cjyjoRncpBrvKaepPz4eagbzFtOVnYCmANpUx0Vntm1lzlabOri5BF97CLNfTL440SIhbqwd459eQ") and then I'm converting that to a Uint8Array

@rimu Are you feeding output from generate_registration_options() in py_webauthn to startRegistration() in @simplewebauthn/browser? If so you're intended to pass in the JSON options as-is and let startRegistration() take care of converting a value like OPclcKTH6cjyjoRncpBrvKaepPz4eagbzFtOVnYCmANpUx0Vntm1lzlabOri5BF97CLNfTL440SIhbqwd459eQ into the bytes that navigator.credentials.create() expects.

rimu commented 4 months ago

Ah, yes, now I can see where startRegistration does the conversion. I still get the same error, though.

I'm using 9.0.1 which is a bit old. That is the version that <script src="https://unpkg.com/@simplewebauthn/browser/dist/bundle/index.umd.min.js"></script> provides. Would it help if I was using 9.0.3?

rimu commented 4 months ago

Solved by using a shorter user_id: https://github.com/duo-labs/py_webauthn/issues/199#issuecomment-1981888478

MasterKale commented 2 months ago

These changes are now available in the recently-published @simplewebauthn/browser@10.0.0, @simplewebauthn/server@10.0.0, and @simplewebauthn/types@10.0.0 ✌️

P4sca1 commented 2 months ago

@MasterKale Is it required to save the random generated user ID value in the database? From my understanding, it is randomly generated in generateRegistrationOptions but then it is never validated in the future. Do I miss something? I previously used internal user ids (ascending integer ids) for the userId field, but I think I can just ignore the field entirely with v10? What is the value even for? Does it need to be known in the backend?

wparad commented 2 months ago

@MasterKale Is it required to save the random generated user ID value in the database? From my understanding, it is randomly generated in generateRegistrationOptions but then it is never validated in the future. Do I miss something? I previously used internal user ids (ascending integer ids) for the userId field, but I think I can just ignore the field entirely with v10? What is the value even for? Does it need to be known in the backend?

@P4sca1, The userId is the userId, if you aren't using the userId how are you identifying who the user is?

P4sca1 commented 2 months ago

I create a HTTP session when the authentication / registration process starts which contains the user id and WebAuthn challenge. When verifying passkey authentication request, I select the Passkey with the given base64CredentialId. The resulting Passkey object contains the userId.

P4sca1 commented 2 months ago

Okay, I found it in the spec (emphasis mine):

A user handle is an identifier for a user account, specified by the Relying Party as user.id during registration. Discoverable credentials store this identifier and MUST return it as response.userHandle in authentication ceremonies started with an empty allowCredentials argument.

The main use of the user handle is to identify the user account in such authentication ceremonies, but the credential ID could be used instead. The main differences are that the credential ID is chosen by the authenticator and is unique for each credential, while the user handle is chosen by the Relying Party and ought to be the same for all credentials registered to the same user account.

Authenticators map pairs of RP ID and user handle to public key credential sources. As a consequence, an authenticator will store at most one discoverable credential per user handle per Relying Party. Therefore a secondary use of the user handle is to allow authenticators to know when to replace an existing discoverable credential with a new one during the registration ceremony.

https://www.w3.org/TR/webauthn-3/#user-handle

For the secondary use case, it reads like the spec suggests to use the same user id for all authenticators that belong to the same user. This would imply that using random user ids is not ideal?

P4sca1 commented 2 months ago

From https://www.w3.org/TR/webauthn-3/#sctn-user-handle-privacy:

It is RECOMMENDED to let the user handle be 64 random bytes, and store this value in the user account.

MasterKale commented 2 months ago

The main use of the user handle is to identify the user account in such authentication ceremonies, but the credential ID could be used instead. The main differences are that the credential ID is chosen by the authenticator and is unique for each credential, while the user handle is chosen by the Relying Party and ought to be the same for all credentials registered to the same user account.

There's also a subtle nuance in here that credential IDs are chosen by the authenticator, but user IDs are chosen by the RP. That makes user IDs "better" from the RP's perspective because it's a value that the RP controls and can predict the shape of vs the variability observable in credential ID lengths.

For the secondary use case, it reads like the spec suggests to use the same user id for all authenticators that belong to the same user. This would imply that using random user ids is not ideal?

One of the main goals of using random identifiers for user.id is to preserve user privacy by not using email addresses or usernames that could be used to link a user across multiple sites, even sites that don't use passkeys at all! I missed that the spec says "ought to be the same for all credentials registered to the same user account", but honestly if you're going this route (vs specifying user IDs yourself) you could just as easily use a lookup table to handle random user IDs for every credential that all link back internally to a single DB user ID 🤔

MasterKale commented 2 months ago

(That said I still personally think it's okay to simply look up users by credential ID; I haven't heard of anyone running into issues with this approach. It's stuff like this that makes it hard to suggest the "best" way to do anything with SimpleWebAuthn: RP use cases and risk models are so varied I try to find a middle ground that'll not make a ton of work for any one type of RP...)

P4sca1 commented 2 months ago

Thank you for the detailed answer and your continued work on the SimpleWebAuthn library! I decided to continue using the user ids from database (incremental primary keys) as the user handle. They are not random and allow to identify a user within my application (at least find the associated username), but I do not consider this to be a privacy issue for my application. There is no email or real names revealed. For authentication I continue looking up passkeys in my database based on the base64 credential id, instead of the user handle. This seems to be fine by the spec and is working fine. The user handle is not guaranteed to exist or exists in a different format for authenticators, that have been generated using previous registration options. The credential id is always there.

MasterKale commented 2 months ago

The user handle is not guaranteed to exist or exists in a different format for authenticators, that have been generated using previous registration options. The credential id is always there.

This is another reason I started looking up users by credential ID and never really looked back. Nowadays userHandle is almost always being returned, but not even L3 is going to make userHandle required so as of 2024 RPs will still have to prepare to handle its absence. And when userHandle is unavailable, how else can you look up the user? By credential ID...such is the way this stuff goes 😂