cloudfoundry / log-cache

Archived: Now bundled in https://github.com/cloudfoundry/log-cache-release
Apache License 2.0
14 stars 11 forks source link

log-cache could bring UAA down #92

Closed paroxp closed 5 years ago

paroxp commented 6 years ago

 What

We've had a situation, where we've run some manual tests on log-cache and noticed the hits to UAA increased massively. This is caused by log-cache authenticating the JWT token, every time it receives the request.

See below for the journey.

We fear this could be problematic on a bigger scale.

I think it may be the same with CF API permissions check?

Potential solution

You could obtain a signing key from UAA once and cache it in memory, on the startup:

https://docs.cloudfoundry.org/api/uaa/version/4.20.0/index.html#token-keys

And self validate the keys as and when you need?

Journey

cf-gitbot commented 6 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/160478894

The labels on this github issue will be updated when the story is started.

aeijdenberg commented 5 years ago

I've seen the commit that looks like it caches responses from UAA.

Is there a good reason it doesn't instead keep the public key from UAA and validate tokens itself?

One of the key reasons to use short-lived JWTs for access tokens is specifically to avoid talking to an authorization server during processing, so it's surprising to see doing so encouraged.

If it's helpful, we have code that uses the kid attribute to cache and validate against here: https://github.com/govau/cf-common/blob/master/uaa/token_validator.go#L311

jtuchscherer commented 5 years ago

@aeijdenberg We were just following the recommendations of the UAA team.

Our goal was to simply protect UAA and our solution solves that in a simple way that is inline with the recommendations of the UAA team.

What exactly are your concerns?

aeijdenberg commented 5 years ago

@jtuchscherer - it's generally considered unnecessary to need to connect to an authorization server to validate a JWT - that's usually the point of a JWT. It's nearly the equivalent of typing 2 + 2 into a Google Search and having Google compute the result - it might work, but it would much cheaper and faster to just add them in your own head.

ie you can decode the JWT yourself, extract the key ID used to sign it, then, if you don't already have it cached, fetch that public key from UAA, then verify the signature yourself (https://tools.ietf.org/html/rfc7519#section-7.2).

It's not necessary to request that UAA does it for you (and since last I checked UAA has no facility for automated rotation of signing keys, and worse uses the same key for signing long-lived (30 days) JWT refresh tokens, you can be pretty much guaranteed you'll only need to fetch the public key once).

Last I checked other components of CF (such as cloud_controller_ng) validate tokens themselves in this manner.

keymon commented 5 years ago

I agree with @aeijdenberg . That is actually the point of the verification key and although caching UAA responses might work, you are still susceptible to DoS (e.g. https://github.com/cloudfoundry/log-cache/commit/eacb1d4ccae80dd38be74767a8002a82c82e5ad3#commitcomment-31715166) and do a lot of requests to UAA.

jtuchscherer commented 5 years ago

@keymon @aeijdenberg Sorry, that this took so long. But we finally reached consensus, that we are going to go ahead and implement offline token validation. It's in the backlog know, so it should be picked up rather soon.

toddboom commented 5 years ago

@aeijdenberg We just wrapped up the offline token validation based on the feedback you provided in the PR. I'd love to get your comments on the update before we close this out. Everything has been merged into master. Thanks!

/cc @keymon @paroxp

aeijdenberg commented 5 years ago

Thanks @toddboom - I think it's looking pretty good now, thanks for accepting the feedback.

The fews thing I'd change are a bit minor related to RefreshTokenKeys().

  1. I'd make this private - I can't see a need to call it from external users of this part?
  2. I'd make that method grab the mutex - right now it basically relies on the caller doing so which might be OK if it's private, and that's documented in a comment, but currently neither of those statements are true.

But the overall structure looks good!

toddboom commented 5 years ago

@aeijdenberg I did something slightly different, but it should have accomplished the intent of your suggestion. Let me know how that looks and maybe we can get this issue closed out. Thanks again!

aeijdenberg commented 5 years ago

@toddboom - it's close. https://github.com/cloudfoundry/log-cache/blob/5efc811ddb446b2f989011c5c6e0835b9337a870/internal/auth/uaa_client.go#L157 isn't guarded by a mutex though.

Technically that might be okay, depending on whether a map value swap is atomic or not (I'm not sure in Go if it is or not), but I don't think it's intended by the code as is?

toddboom commented 5 years ago

Ah, yep. Good catch. I sort of think that check should move into the other method anyhow. I'll give it a rework and ping you.

toddboom commented 5 years ago

@aeijdenberg Ok, I think we should be good now. I removed the mutex and switched to a sync.Map and used atomics where necessary. There's also a new test to try to exercise the concurrency so that the race detector can catch it.

Let me know if you've got any other suggestions. Thanks!

toddboom commented 5 years ago

Ok, this has landed in master and will be available when we cut the next 2.2.x release. Definitely let us know if you see any issues. Thank you all for the feedback on this!