Optimise TokenUtils parsing

noetro commented 2 years ago

Changes

Please describe both what is changing and why this is important. Include:

_We've been using the project for a while and I'm a fan. I was in for our 4.0 upgrade and thought I would see if I could find some small thing to start giving back. This optimises the TokenUtils parsing by relying on the JWT format is very specific. In particular:

The format is checked to be good before any array is allocated.
In the case of alg='none', two array allocations are not required._

References

Please include relevant links supporting this change such as a:

No references to note

Testing

Existing TokenUtils test had good coverage. Added a couple of test cases for corner case odd string inputs.

[X] This change adds test coverage
[X] This change has been tested on the latest version of Java or why not

Checklist

[X] I have read the Auth0 general contribution guidelines
[X] I have read the Auth0 Code of Conduct
[X] All existing and new tests complete without errors

jimmyjames commented 2 years ago

👋 hi @noetro, thanks for the contribution! Appreciate your willingness to contribute 🙇. Also really like the additional tests (those would be good to add regardless of this PR).

Can you expand a bit on the optimizations benefit in this change? As you noted, I see that we aren't allocating an array prior to ensuring the format is correct, and also not re-allocating in case the alg is none. Any thought to how those optimizations compare to the repeated calling of indexOf and substring? Just want to ensure the optimizations are enough to justify the additional code and handling introduced here. Thanks!

noetro commented 2 years ago

Hi @jimmyjames - Sure, happy to discuss. You don't get around the indexOf and substring calls in any scenario here. Have a look at the implementation of String.split(regex) that the current code uses (at least on my OpenJDK). It has a fast path to try to detect a single character regex and uses a similar but more generic implementation (it needs to grab N parts) that is basically a while -> indexOf -> substring pattern.

So, it's a similar implementation in terms of dicing the string. We can just be more specific as we know exactly a pattern we would accept. We can get out early if we don't have two-and-only-two delimiters, and we can avoid allocating a large array if we try to parse a large string with many delimiters.

noetro commented 2 years ago

I also took a quick look at a few other open source libraries. It seems like a mixed bag of people doing split(regex) (Vert.x JWT auth) vs. something like what I have here with a few indexOf (Jose JWT)

jimmyjames commented 2 years ago

@noetro thanks for the info! Change looks good, just failing on a couple checkstyle issues. You can see the issue locally by running ./gradle clean check. If you can push another commit to resolve those issues, we can merge this PR. Thanks!

noetro commented 2 years ago

Oops, missed there was checkstyle here. I pushed a commit correcting them @jimmyjames

jimmyjames commented 2 years ago

Thanks @noetro! Looks like there are some additional testing paths that need to be covered: https://github.com/auth0/java-jwt/pull/611/checks?check_run_id=8736927827

Once we get tests for those paths we should be good to go. Thanks again!

noetro commented 2 years ago

Added a test case for "no parts" @jimmyjames . Intellij at least reports 100% test coverage. I haven't used codecov before - I'm not sure I agree that all 3 added lines lacked coverage. Those thrown exceptions were asserted in the TokenUtilsTest.

We were missing a test case for no delimiters in the input string though, so I added shouldThrowOnSplitTokenWithNoParts and that moved Intellij reporting to 100% on TokenUtils when running TokenUtilsTest

jimmyjames commented 2 years ago

@noetro I think I know what the issue is. It might be an oddity with the jacoco test coverage, but we should update the TokenUtils#wrongNumberOfParts method to just return a new JWTDecodeException, instead of throwing it. Because in use, currently the change calls throw wrongNumberOfParts(2). If you change wrongNumberOfParts to simply return a new exception, that will be correct and also I bet it fixes the coverage issue. Thanks!

jimmyjames commented 2 years ago

I added a new commit that changes wrongNumberOfParts to return a new JWTDecodeException, as discussed.

We're going to release 4.1.0 early next week, then will merge this change. Thanks again!

@noetro I think I know what the issue is. It might be an oddity with the jacoco test coverage, but we should update the TokenUtils#wrongNumberOfParts method to just return a new JWTDecodeException, instead of throwing it. Because in use, currently the change calls throw wrongNumberOfParts(2). If you change wrongNumberOfParts to simply return a new exception, that will be correct and also I bet it fixes the coverage issue. Thanks!

noetro commented 2 years ago

Interesting find @jimmyjames - thanks for pushing.

noetro commented 2 years ago

Sounds good @jimmyjames - I'll have another one for you after we merge this that's a slightly bigger change but makes a noticeable performance improvement on deserialisation.

auth0 / java-jwt