DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Add support for combined bearer token to Swagger UI #4324

Open hannes-ucsc opened 2 years ago

hannes-ucsc commented 2 years ago

https://github.com/DataBiosphere/azul/issues/3833#issuecomment-1190516562

In Swagger UI, combine access and ID token into a JSON structure

{
    "access": …,
    "id": …
}

serialize that structure to JSON without indent but with sorting the keys, base64-encode (urlsafe) the resulting sequence of bytes and pass as bearer token to API when user initiates requests to Azul.

In Azul, try decoding that structure. If that fails, assume that a sole access token was passed as before. If it succeeds, skip the request to /userinfo and use the ID token embedded in the structure instead. The rest of the request processing continues as usual: the list of accessible sources is looked up in the cache and used if found (hit), or requested from TDR if not (miss).

This does leave the door open to an attacker sending a request in which the access and ID tokens in the combined bearer token belong to different users. If both tokens belong to the attacker (under two different identities), the damage is limited: we may associate a list of sources accessible to the attacker's first identity with their second identity and cache it thereunder. If the access token is stolen, we make the sources available to the attacker, and cache it under the attacker's identity. No more information is leaked in this case than if the attacker passed just the stolen access token. The only difference is that we may end up with two cache lines, one for the rightful owner of the access token, and one for the attacker. The normal mechanisms for preventing access token theft in OAuth 2.0 will need to suffice.

If the attacker steals another user's ID token and passes it in combination with their own access token, the user's cache line will be poisoned with the sources accessible to the attacker, and count against their number of in-use access tokens. This could result in denial-of-service to the legitimate user (a 403 due to the session limit or a response that's not reflecting the user's actual access). To prevent this scenario, an ID token that was extracted from the combined bearer token must be marked as untrusted, and, on a cache miss, be verified using /userinfo (passing the access token to retrieve an ID token). Furthermore, cache hits with a previously unseen access token must be treated as a cache miss, to prevent the cache poisoning described above.

ID tokens that were determined using /userinfo before the cache lookup—because a sole access token was passed in—should be marked as trusted and don't need to be verified a second time, though they still should be validated as defined in RFC 7519.

melainalegaspi commented 2 years ago

@hannes-ucsc to provide description.

hannes-ucsc commented 2 years ago

The least defined aspect of the design in this ticket is how to get the Swagger UI to pass a custom token, the combined bearer token. Here are a few links:

https://github.com/swagger-api/swagger-ui/issues/4084 https://github.com/swagger-api/swagger-ui/issues/7561

and, from a comment on that last issue, someone customizing the Swagger UI to obtain an ID token using the implicit flow. We also use the implicit flow. Part of the customization is passing the ID token instead of the access token. Somewhere in that code path we should be able to combine both tokens. I expect most of the time spent on this ticket to be used to figure this aspect out.