Support for long-lived API tokens

Some API operations like downloading many large files via a curl manifest may take hours to complete. Google access tokens expire in one hour. We should offer an alternative authentication method that uses longer-lived tokens, also known as API tokens.

After reading about OAuth 2.0, my stance is that it is OK to expose a refresh token to single-page applications (SPAs) as long as the OAuth 2.0 client secret that was used to obtain the refresh token is kept, well, secret. Resource servers will only accept access tokens, not refresh tokens, and to get a fresh access token, the application (the Azul server) would need to make a POST request to the authorization server's /token endpoint passing the refresh token and the client secret. IOW, as long as we keep the client secret secure on the Azul server, we can expose the refresh token the Data Browser (DB).

Note that DB and Azul use different OAuth 2.0 client credentials. It would NOT be safe to have Azul generate the refresh token using the DB's client secret and then expose the resulting token in the DB since Azul doesn't own that client and can't guarantee its secrecy. Similarly, it would not be secure to have Azul and DB use the same OAuth 2.0 client.

The API token is derived from the refresh token. The derivation function does not need to be cryptographic, it would be totally fine to have the API token be the refresh token, but I think that we want to wrap the refresh token in our own JSON structure and base64-encode that structure (just like JWT), in order to make room for future extensions. The JSON should be small and have a key for denoting the version of the document shape even if there is no JSON schema document describing that shape.

If an attacker manages to steal an API token from the DB session on a user agent, they can then make requests to Azul on behalf of the user and read all (meta)data the user has access to in Azul or TDR. If they extract the refresh token from the API token, they wouldn't be able to interact with other resource servers on behalf of the user (since those require access tokens) or get an access token (since that requires the client secret). If the attacker manages to steal an access token they can do the same, except only for one hour. We should therefore minimize the exposure of the API token on the DB and Swagger UI clients so that it is not persisted on those clients and is not involved in making any other Azul requests made by them. The user should be presented a UI from which they can copy the API token (or the prepared curl command line with the token embedded) but the architecture should not require storing the API token in the browser session. The API token should be obtained via a POST request to Azul (a new /token endpoint) and returned in the response body so that it doesn't occur in the browser history. Furthermore, getting an API token from Azul should require consent from the user for Azul to obtain the refresh token, separately from the consent the user gave the DB or the Swagger UI. If it only took an access token to obtain an API token, an attacker would only need to steal an access token to act on the user's behalf for a long time.

Getting a refresh token requires getting consent from the user in an authorization code flow. The resulting authorization code can then be used to request a refresh token. Getting consent involves two redirects: one from the application server (Azul) to the authorization server (Google, at a fixed and known URL) and another one from the authorization server back to the UI (Swagger UI or DB). Care must be taken that the second redirect only points to a trusted site. The URL of any trusted UIs must be registered with the authorization server as valid redirect locations for the Azul OAuth client credentials. Before registering a trusted UI there, the Azul engineer would need to verify that the UI does not persist or otherwise misuses the API token.

What follows is a typical event sequence that outlines how we plan to implement API tokens:

1) User visits DB using a web browser (user agent)

2) User initiates the Export Data flow for a curl manifest

3) DB interacts with Azul's /fetch/manifest/files endpoint, passing the users access token if they are logged into the DB

4) At the end of the flow, the DB displays the curl command returned by Azul. The command line contains a -H flag for the Authorization header with the user's access token. If the user is signed into the DB, the same page also contains a button to optionally obtain a command line with a longer lived API token. If the button were available for anonymous users, the cached manifest would be invalidated by providing credentials because different credetials can yield different manifest contents. The manifest has already been generated and we don't want to trigger a regeneration this late in the flow.

6) The user clicks the button. The DB makes POST request to Azul's /token endpoint. The request should not include any Authorization headers. The POST should be done in an iframe or window such that the redirect response does not navigate away from the main DB page.

7) Azul returns a response redirecting to the Google authorization server at https://accounts.google.com/o/oauth2/v2/auth with

- `client_id` set to Azul's client ID, 
- `redirect_uri` set to Azul's `/token` endpoint, 
- `response_type` set to `code` so that we get an authorization code back
- `scope` set to `openid email` so that we can later exchange the authorization code for an identity token. We need to know the user's identity so we associate persistent state with it.
- `access_type` set to `offline` so that we can later exchange the authorization code for a refresh token
- `state` set TDB (recommendation is a nonce)
- `include_granted_scopes` not set
- `login_hint` not set
- `prompt` not set so that the user is only prompted if they haven't already given consent to Azul

8) User agent follows redirect inside iframe/window

9) If prompted, user gives consent

10) The authorization server redirects back to Azul's /token endpoint with either error or code query param.

11) User agent follows redirect inside iframe/window, makes GET request to Azul's /token endpoint

12) Azul extracts authorization code from request and makes a POST to Google authorization server's /token endpoint, with

- `client_id` set to Azul's client ID, 
- `client_secret` set to Azul's client secret, 
- `code` set to the code from the incoming GET
- `grant_type` set to `authorization_code`
- `redirect_uri` set to TBD (Azul won't follow any redirect in the response to the POST)

13) Google authorization server sends back JSON with an id_token and an access_token and, if this is the first time the user has given consent to Azul, a refresh_token.

14) If refresh_token is present, Azul stores it in a DynamoDB table under a key that is composed of the iss and sub claims from the id_token (a JWT, https://datatracker.ietf.org/doc/html/rfc7519). If it is absent, Azul retrieves the refresh_token from DynamoDB using the same key. The key should be present in DynamoDB because the user had previously given consent. Azul also stores the access token and its expiration using the refresh token as the key.

TBD: Not sure what happens if user gave consent with implicit flow on Swagger UI. We'll find out.

15) Azul wraps refresh token in API token JSON and sends back response with token as body.

16) DB extracts API token from iframe/window.

TBD: If the extraction turns out to be problematic, we'll discuss remedies. Azul could set a CORS header in its response, allowing the DB origin to access the JSON response in the iframe. Azul could send back a HTML with JS that `window.postMessage`s to the main DB page. Azul could send back a redirect to a DB URL that was passed as a parameter of the initial POST to Azul's `/token` endpoint, and to which Azul appends a fragment with a `code` to be exchanged via yet another POST to Azul's `/token` endpoint for the API token. Obviously, the latter option is the most complicated and I hope we won't need it.

17) DB closes iframe/window and repeats the request to Azul's /fetch/manifest/files endpoint with an Authorization header set to the API token.

18) Azul uses the refresh token inside the API token to look up the most recent access token and its expiration. If the access token is expired, it uses the refresh token to obtain a fresh access token as described in https://developers.google.com/identity/protocols/oauth2/web-server#offline and writes the new access token to Dynamo DB under the refresh token as a key. Azul stashes the refresh and access token in Chalice's current request object

19) Azul checks the cached manifest's validity and, if necessary, kicks of a generation of a fresh manifest. The likelyhood of this is low since both access tokens (the one passed by DB and the one derived by Azul) refer to the same user. As long as neither the Azul index nor the user's access changes, the cached manifest will be valid.

20) If the cached manifest is valid, Azul responds with a 302-inside-200 response pointing at the manifest and including a command line that mentions the API token, not the access token. If the cached manifest is invalid, Azul responds with a 301-inside-200 response pointing back at itself.

TBD: We might want to add a fail-fast option by allowing DB to pass the manifest's object key to `/fetch/manifest/files`. Currently that only works for the non-fetch `/manifest/files` endpoint. If an object key is passed, Azul would fail instead of generating a new manifest should the cached one be invalid.

21) The DB follows 301-inside-200 responses, if any. Once the final 302-inside-200 response with the command lines is returned, DB renders the command line on the page. The curl command line lists the API token.

22) User pastes the command line into a terminal. The command line consists of two curl invocations: one against the /manifest/files endpoint to actually download the manifest, the second one to process the individual file URLs in the manifest. Both invocations contain the API token in the -H Authorization … flag. Consequently, when the second curl invocation requests each file URL, it passes along the API token. When handling all these requests, Azul first retrieves and, if necessary, refreshes the access token as outlined in step 18, then continues normally. Any TDR interactions are made with a guaranteed fresh access token.

References:

Discussion of Google's standard OAuth 2.0 implementation:

https://developers.google.com/identity/protocols/oauth2/web-server

Discussion of Google's OIDC extension to its OAuthe implementation.

https://developers.google.com/identity/protocols/oauth2/openid-connect

DataBiosphere / azul

Support for long-lived API tokens #3328