duo-labs / py_webauthn

Pythonic WebAuthn 🐍
https://duo-labs.github.io/py_webauthn
BSD 3-Clause "New" or "Revised" License
856 stars 171 forks source link

feat/remove-pydantic-dependency #195

Closed MasterKale closed 9 months ago

MasterKale commented 9 months ago

An experiment in simplifying the list of this package's dependencies. PR is currently a WIP while I continue to explore this.

Fixes #196.

jwag956 commented 9 months ago

Good morning - I tried this PR and my user_handle tests are failing. From looking at parse_authentication_credential_json it isn't treating user_handle as a b64 encoded URL - which I think it should (for JSON input - there really isn't anything else a client can send it that is proper JSON).

            response=AuthenticatorAssertionResponse(
                client_data_json=base64url_to_bytes(response_client_data_json),
                authenticator_data=base64url_to_bytes(response_authenticator_data),
                signature=base64url_to_bytes(response_signature),
                user_handle=response_user_handle,
            )

This is also somewhat suspect code (just ignoring non strings?)

    if isinstance(response_user_handle, str):
        response_user_handle = response_user_handle
    else:
        response_user_handle = None
jwag956 commented 9 months ago

I have a couple silly negative tests that are also failing. They create a junk JSON document and verify error. They are tripping up on parse_authentication_credential_json:

    # Appease mypy
    assert isinstance(json_val, dict)

This should probably be a real test and raise InvalidJSONStructure("Silly JSON")

MasterKale commented 9 months ago

(for JSON input - there really isn't anything else a client can send it that is proper JSON).

UTF-8 is valid string encoding too, which goes back to my comment in #180 about wanting to handle user.id and user_handle as UTF-8 specifically because I believe it's a better dev experience:

So userHandle is an interesting one in the spec. user_id, which this library types as str in generate_registration_options(), is JSON-serializable as-is so it doesn't undergo any transformation into base64url. If you then decode the string as UTF-8 to bytes on the front end during registration, just before you pass it in as options.user.id when calling .create(), then during authentication the value of userHandle will be bytes you can encode back to UTF-8 to get the exact same value you specified as user_id during registration.

I found two years ago, from a developer ergonomics story, that it's much better to not base64url-encode/decode user.id and userHandle and use UTF-8 instead.

I'm going to think through this now and adjust this PR accordingly.


This is also somewhat suspect code (just ignoring non strings?)

    if isinstance(response_user_handle, str):
        response_user_handle = response_user_handle
    else:
        response_user_handle = None

Good call out, thank you. I can definitely do better here.


    # Appease mypy
    assert isinstance(json_val, dict)

This should probably be a real test and raise InvalidJSONStructure("Silly JSON")

Nice suggestion 🚀

jwag956 commented 9 months ago

OK - I can work with UTF-8 as user_handle...

MasterKale commented 9 months ago

OK - I can work with UTF-8 as user_handle...

I'll be sure to document this behavior too, I promise 🙏

jwag956 commented 9 months ago

I am still struggling to get this to work with my sample front-end. I don't understand this code in def options_to_json():

        if isinstance(options.user.id, bytes):
            _user["id"] = bytes_to_base64url(options.user.id)
        else:
            _user["id"] = options.user.id

generate_registration_options already has done:

user=PublicKeyCredentialUserEntity(
            id=user_id.encode("utf-8"),
            name=user_name,
            display_name=user_display_name,
        ),

So it will always be 'bytes' and then convert it to base64?

MasterKale commented 9 months ago

So it will always be 'bytes' and then convert it to base64?

Right, this is current behavior that's confusing and definitely not desirable as per another comment I left over in #180:

You know what, I just realized that I've been base64url-encoding user.id the entire time, which means userHandle is coming back as the base64url-encoded version, not the string passed in as user_id into generate_registration_options(). I thought I'd fixed that... 🤦‍♂️

I've stashed a couple of approaches I plan on taking after merging this PR to fix this. I'll either A) always treat user.id and user_handle as UTF-8 strings, or B) allow UTF-8-string or bytes as values for user.id, base64url-encode user.id if it's bytes, and then treat that user identifier as UTF-8 when handling it later during authentication as user_handle.

tl;dr: for this PR the behavior will remain as-is. I'll fix this behavior afterwards as it's very confusing.

jwag956 commented 9 months ago

Sorry to be a pain here - this affects the front-end API and changing that needs careful consideration and of course breaks backwards compatibility. So front-ends (mostly JS I presume) must take user.id and b64decode into a Uint8Array to pass into credentials.create - then take the response from authentication (user_handle) and send it back as plain utf-8? Or send it back as b64encoded and have RP code do the decode (which will break if you change it).

I suppose I am asking if this could be 'fixed' (either way) PRIOR to releasing. (different PR is fine - just prior to release) and and that you give this a major release change (2.0).

As always - thanks for all the effort and consideration you put into the package... we are all very very appreciative.

MasterKale commented 9 months ago

So front-ends (mostly JS I presume) must take user.id and b64decode into a Uint8Array to pass into credentials.create - then take the response from authentication (user_handle) and send it back as plain utf-8? Or send it back as b64encoded and have RP code do the decode (which will break if you change it).

To get the most out of userHandle during authentication an RP using this library would have to know to base64url-decode user.id during registration after calling options_to_json(generate_registration_options(...)) and before calling navigator.credentials.create().

If a user of this library does base64url-decode user.id before calling WebAuthn's navigator.credentials.create(), then the bytes they get later for userHandle after a call to navigator.credentials.get() would be the same UTF-8 bytes they specified as the string argument user_id when calling generate_registration_options().

If they do not base64url-decode user.id before calling WebAuthn's navigator.credentials.create(), then the bytes they get later for userHandle after a call to navigator.credentials.get() would always be the base64url-encoded UTF-8 bytes they specified as the string argument user_id when calling generate_registration_options(). The RP would need to base64url-decode userHandle to bytes, and then encode those bytes to UTF-8 to get the same UTF-8 bytes they specified as the string argument user_id when calling generate_registration_options().

This...is potentially going to make migrating from v1.x to v2.0 of py_webauthn a little messy. I think the simplest migration path would involve base64url-encoding the string they pass in for user_id when calling generate_registration_options(). This is anticipating my following up this PR with a subsequent change to always treat the string value user_id as UTF-8 bytes and stop base64url-encoding it - it's not necessary, the value is already a string.

To reiterate, a goal of the upcoming v2 is for the UTF-8 bytes that an RP specifies for user_id (as a basic str, i.e. user_id="user_id_here") when calling generate_registration_options() be the same bytes that get returned for userHandle after a call to navigator.credentials.get(). I could then advise users of py_webauthn to UTF-8 encode these bytes on the front end when preparing JSON to submit to the server. That would at least make it much easier to talk about how one value becomes the other.

All of this said, I honestly don't think userHandle should have much value to RPs. Even during conditional UI, when no username is provided, the credential ID that corresponds to the public key that can successfully verify the signature coming out of WebAuthn's .get() should then be used internally to figure out which user to sign in. userHandle on the front end isn't signed over and so it can't really be trusted. I don't think this gets me off the hook to figuring out a path forward around this, admittedly...

I suppose I am asking if this could be 'fixed' (either way) PRIOR to releasing. (different PR is fine - just prior to release) and and that you give this a major release change (2.0).

I don't think I want to put much effort into trying to solve this with the current v1.x version of py_webauthn as I think it's kind of a breaking change all in its own. I'd rather bundle it with a more "drastic" change like this PR and break many things at once instead of coordinating one per major release.

jwag956 commented 9 months ago

As always - thank you for your detailed response..Part of the confusion for me is that string (unicode) != UTF-8 in python - and in python, converting a string to UTF-8 ("my string").encode() results in bytes - which isn't JSON encodable (you have to cast it back to string). So your goal of having user.id be a 'string' seems good but does anything care if it encoded as long as what is returned as part of authentication is the same as was passed into registration?

All that aside - I have everything working with this PR - with just the one modification to b64decode in my signin code. Thanks.

MasterKale commented 9 months ago

Part of the confusion for me is that string (unicode) != UTF-8 in python

I went digging into Python's Unicode support and it's my understanding "Unicode support" in Python commonly refers to UTF-8, based on https://docs.python.org/3/howto/unicode.html#encodings:

UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit values are used in the encoding. (There are also UTF-16 and UTF-32 encodings, but they are less frequently used than UTF-8.)

As for the results of your testing...

All that aside - I have everything working with this PR - with just the one modification to b64decode in my signin code.

Thank you for taking the time to test this PR. I'll go ahead with it, but it won't go out immediately as v2.0. I'll be making a couple of other breaking changes (like the user ID stuff we've talked about in here) before rolling all of that into v2.0.

jwag956 commented 9 months ago

In python 3 all strings are unicode. It is true that utf-8 is the default encoding (e.g. "hi there".encode()) will convert the unicode string "hi there" to a UTF-8 encoded series of bytes. One thing I noticed was the W3C spec calls user.id "A user handle is an opaque byte sequence with a maximum size of 64 bytes, and is not meant to be displayed to the user."

Since unicode may or may not be 1-1 with bytes - I wonder if it would be better for the py_webauthn API to take bytes - not string - in order to avoid an issue with upon encoding it is too long. Then I suppose encoding that using b64 encoding in order to create a JSON serializable value and having the front-end convert to a uint8array (as I know mine does today) . The same in the reverse upon authentication response. Of course - this behavior is pretty much exactly what py_webauthn and pydantic 1.0 did :-)

I don't really care either way - my user_handle is a 64 bytes hex string. At a minimum it is probably worth documenting the restrictions on passing in an arbitrary unicode string.