duo-labs / py_webauthn

Pythonic WebAuthn 🐍
https://duo-labs.github.io/py_webauthn
BSD 3-Clause "New" or "Revised" License
856 stars 171 forks source link

Can't use `bytes` as user ID #187

Closed lgarron closed 9 months ago

lgarron commented 11 months ago

Given that the user handle is an ArrayBuffer in the WebAuthn API, I expected that the user_id value in this library would accept a random value of bytes. I was surprised to find that this is not supported.

This is probably an intentional design decision, but I thought I'd report that requiring it to be a UTF-8 caused more friction for me than allowing randomly generated bytes.

When combined with this line, I can't figure out how to generate an ID using random bytes. I guess you have to generate a random UTF-8 string instead?

If the library is going to be opinionated about adding an abstraction over the actual WebAuthn API, I think it would at least be useful to have a function to generate a random string ID.

MasterKale commented 10 months ago

Historically the idea was that the RP had some string format decided for user IDs. "USR1234567", "98765431", or in lieu of a value like this then something like a V4 UUID could be used too. However the RP wants to serialize user ID bytes into a string, I didn't want to impose any such requirement on that part of how an RP determines its user identifiers.

I take your point that this opinion isn't captured anywhere. I can add this to the README for a continued lack of a proper docs site.

lgarron commented 10 months ago

Historically the idea was that the RP had some string format decided for user IDs. "USR1234567", "98765431", or in lieu of a value like this then something like a V4 UUID could be used too. However the RP wants to serialize user ID bytes into a string, I didn't want to impose any such requirement on that part of how an RP determines its user identifiers.

If the API encourages reusing an existing format, to me that seems counter to the spec and common guidance?

Since the user handle is not considered personally identifying information in § 14.4.2 Privacy of personally identifying information Stored in Authenticators, the Relying Party MUST NOT include personally identifying information, e.g., e-mail addresses or usernames, in the user handle. This includes hash values of personally identifying information, unless the hash function is salted with salt values private to the Relying Party, since hashing does not prevent probing for guessable input values. It is RECOMMENDED to let the user handle be 64 random bytes, and store this value in the user’s account.

https://www.w3.org/TR/webauthn/#sctn-user-handle-privacy

I don't think I'd have trouble using this library correctly myself, but the current API abstraction seems rather "easy to hold wrong" for someone who doesn't already have years of experience with WebAuthn. 😕

Personally, I think it would be a lot safer for the library to have a way to generate 64 random bytes by default, with other use cases behind additional steps.

MasterKale commented 9 months ago

Strangely enough while working on #195 I discovered that webauthn.helpers.generate_user_handle() is a method that's existed in this package for a couple of years now...I completely forgot about it. It generates 64 random bytes and encourages its output to be used as the value user_id when generating registration options 😶‍🌫️

Personally, I think it would be a lot safer for the library to have a way to generate 64 random bytes by default, with other use cases behind additional steps.

In light of this discovery, would making user_id optional in generate_registration_options() and initializing it to generate_user_handle() feel like a "safer" or "simpler" way to encourage non-PII for user identifiers? I suppose then docs would suggest associating options.user.id with a valid user account for lookup later as user_handle in a .get() response.

MasterKale commented 9 months ago

@jwag956 continuing our conversation from https://github.com/duo-labs/py_webauthn/pull/195#issuecomment-1885722131 since it's relevant to this issue:

Since unicode may or may not be 1-1 with bytes - I wonder if it would be better for the py_webauthn API to take bytes - not string - in order to avoid an issue with upon encoding it is too long. Then I suppose encoding that using b64 encoding in order to create a JSON serializable value and having the front-end convert to a uint8array (as I know mine does today) . The same in the reverse upon authentication response. Of course - this behavior is pretty much exactly what py_webauthn and pydantic 1.0 did :-)

In the last 24-48 hours I've been thinking that I need to simplify this towards defining the user_id argument in generate_registration_options() as only ever bytes, which get base64url-encoded when options_to_json() gets called. This would align the behavior to both the WebAuthn L2 spec that has said the following about user.id (as you called out):

The user handle of the user account entity. A user handle is an opaque byte sequence with a maximum size of 64 bytes, and is not meant to be displayed to the user.

Additionally, in L3, new JSON serialization types have been defined that also say user.id should be a "Base64URLString" (an alias for DOMString but trying to communicate an encoding) when calling the upcoming PublicKeyCredential.parseCreationOptionsFromJSON() method:

https://w3c.github.io/webauthn/#dom-publickeycredentialuserentityjson-id

And the new PublicKeyCredential.toJSON() method will also base64url-encode userHandle bytes to make them JSON-friendly:

https://w3c.github.io/webauthn/#dom-publickeycredential-tojson

I think I need to give up my "developer experience" crusade of trying to keep user.id and userHandle something human-readable as it runs counter to explicit user privacy goals in WebAuthn and just follow the spec more closely here.

MasterKale commented 9 months ago

I've merged #197 that makes user_id into an Optional[bytes] argument. When no ID is specified the 64 random bytes will be used for user.id instead; these bytes will get base64url-encoded when passed into options_to_json() to stay consistent with JSON serialization logic we got into WebAuthn L3 (link):

Upon invocation, the client MUST convert the options argument into a new, identically-structured PublicKeyCredentialCreationOptions object, using base64url encoding to decode any DOMString attributes in PublicKeyCredentialCreationOptionsJSON that correspond to buffer source type attributes in PublicKeyCredentialCreationOptions.

This will go out as part of a bigger v2.0 release of this library. I'll follow up when that is available on PyPI.

MasterKale commented 9 months ago

Alright, this change is out in the latest webauthn==2.0.0 on PyPI:

https://github.com/duo-labs/py_webauthn/releases/tag/v2.0.0