base64url decode fails on valid base64url due to capital padding

Thalhammer / jwt-cpp

A header only library for creating and validating json web tokens in c++

https://thalhammer.github.io/jwt-cpp/

MIT License

865 stars 235 forks source link

base64url decode fails on valid base64url due to capital padding #205

Closed simonvanbernem closed 1 year ago

simonvanbernem commented 2 years ago

Describe the bug The base64url decoder does not accept "%3D" as padding, only "%3d" (with lowercase d). As far as I can tell, uppercase is also a valid representation of the padding, and even the predominant one, judging by the google results.

How To Reproduce As an example: The base64url-encoded version of "1" with uppercase D ("MQ%3D%3D") fails to decode.

Expected behavior "%3D" should also be accepted as padding.

Additional context This was discovered because Keycloak changes padding in the state parameter of an authorization request from "%3d" to "%3D" before passing it on to the redirect uri.

Thalhammer commented 2 years ago

Technically any jwt token with any padding at all is invalid. RFC7515 makes this pretty clear:

Base64 encoding using the URL- and filename-safe character set defined in Section 5 of RFC 4648 [RFC4648], with all trailing '=' characters omitted.

The actual jwt spec used to contain a similar sentence in the draft, but it got removed when it was split into two jws+jwt. (JWS describes the structure of a jwt, i.e. the three base64 vals separated by dots).

That being said, I 100% agree that we should at least try to accept malformated (but otherwise correct) input. The quick fix for your specific case would be to use a custom alphabet (Just copy the struct here to your code and replace the fill function with one returning an uppercase value. You can then pass it to the decode function like so:

decoded_jwt(token, [](const std::string& str) {
                  return jwt::base::decode<YOUR_ALPHABET>(base::pad<YOUR_ALPHABET>(str));
              });

Another simple fix would be to simple search for the first character thats not a valid base64 char or a dot and remove everything after it. The library is designed to handle that case.

prince-chrismc commented 2 years ago

Looks like bug https://datatracker.ietf.org/doc/html/rfc3986#section-6.2.2.1

Great catch!

One puzzle I didnt find a good answer for is when did the URI encoding become expected?

https://www.rfc-editor.org/rfc/rfc4648#section-5 (referred to by JWS RFC) uses a = and even https://www.rfc-editor.org/rfc/rfc7515.html#appendix-C only talks about = never escaped...

Perhaps theres a missing encoding step that's needed? 🤔

simonvanbernem commented 2 years ago

Technically any jwt token with any padding at all is invalid. RFC7515 makes this pretty clear

We use the base64url decoder to decode a base64url value that is not part of the jwt token (it's a url parameter in the redirect uri string), so I don't think the contents of the RFC are relevant.

The current situation is at the very least a bit misleading: There is a decoder alphabet is called "base64url", but it doesn't actually decode base64url (because that can contain padding), but the just the jwt-token-subset of base64url. I didn't know that, so I used the decoder on a base64url encoded string that is not part of the jwt token (it's a url parameter in the redirect uri string), and so the decoding failed.

One could rename the alphabet to something like "base64url_without_padding" to communicate the restriction more clearly, but I think just allowing uppercase padding would be the better solution.

Thanks for the workaround suggestion, we just ended up replacing "%3D" with "%3d" in a preprocessing steps, which works but is a little annoying :D

Thalhammer commented 2 years ago

We use the base64url decoder to decode a base64url value that is not part of the jwt token

Ahhh ok that makes a lot more sense. It IS a base64url alphabet (the difference being that + is replaced with -, the / with _ and the = being percent encoded), it's just expecting lowercase encoding. I agree that this is not ideal, but I don't really know how we could change this without breaking everyone that uses a custom alphabet. We would need at the very least a method to provide more than one padding string or (better) some sort of hook point that preprocesses the string (in this case doing urldecode).

prince-chrismc commented 2 years ago

We can have a fancy template deduction enable_if for string or vector of strings 🤔 against the fill to support more complex alphabets.

Or maybe just case insensitive search might be good enough, replace https://github.com/Thalhammer/jwt-cpp/blob/142d2d38e7bb1ad17b9787d776589e719d7494cb/include/jwt-cpp/base.h#L202 maybe with https://stackoverflow.com/a/3152296/8480874?

Thalhammer commented 2 years ago

The problem with doing a case insensitive search by default that it causes problems if the lowercase version of the padding is part of the alphabet. While thats not the case for any base64 version I know of, we can't relly on that since someone might be using a custom alphabet that does. Its unlikely, but not impossible. In which case their code would break at runtime and only sometimes, which is kinda the worst case imho.

prince-chrismc commented 2 years ago

Instead of doing a "case insentive" version I did a substring match on multiple fill patterns. This should avoid the concerns + there's a dedicated alphabet for this exact RFC. If someone writes their own hopefully they have unit tests 🤞