Note to self: focus of this paper is on scaling. There might be other approaches to the same problem.

Problem

Motivation: TLS is good enough for point-to-point authentication.

But how do we handle request-level authentication between trusted services that communicate through less-trusted intermediary hosts, or "proxies".

Solution

If we do TLS-based authentication as is, authZ will fail. We can circumvent this by letting "proxy" act as "client". However, this approach suffers from a larger surface of attack, as we need to trust and secure both proxy and client equally. So, this is not scalable.

This leads us to token-based authentication. Basically, we pass an unforgeable token from client to server.

Assumption: Proxy is assumed not to have malicious code that actively change user request.

Public-Key variant

As each host is already associated with a certificate, which contains client's public key and other info, we can create a token based on this certificate.

The idea is token = host_cert || req_metadata || sig where sig is Sign(secret key, host_cert || req_metadata). request metadata includes information about proxies used, resources requested, actions to-be-done and timestamp.

(Yes, we don't sign over the request data. Here's where the assumption plays into part. But at least the sig part of the token is unforgeable without the client's secret key)

The verifier will first validate the host_cert presented using the master public key. Then it will extract the client public key info from host_cert and recalculate sig to see if it matches.

Some notes:

A client can reuse the same token to request to 2 different hosts running the same server's service. But 2 hosts running the same client service has totally different token from a server's POV (due to different host_cert).
This variant is used internally in FB for s2s authentication.

Upsides:

no extra dependency
generally reliable and simple
as it uses asymmetric crypto, any service along the path can independently verify and break early if verification failed.

Downsides:

large, as it needs to contain the entire host certificate.
slow, as it relies on asymmetric crypto. Although some caches might help for larger number of requests, this approach is not as scalable to its symmetric crypto variant.
relies on on-disk certificates, which makes it hard to tie it to identities for non-host entities (e.g. users or jobs)

Symmetric Key variant, i.e. Crypto Auth Token (CATs)

Using asymmetric key works. However, it's 4 orders of magnitude slower than symmetric key. And for facebook's scale, that's not gonna make it. So can we use symmetric key?

It makes use of a pseudo-random function (PRF).

There are 3 keys:

master_key msk: secret random string that can only be accessed by key distribution server.
service_key: issued to the service. This is PRF(k=master_key, m=server_info)
session_key: issued for each (client, service) pair. This is PRF(k=service_key, m=client_info)

CAT is (client_info, mac(k=session_key, m=data)). Given this CAT, server can verify, by locally generating session_key from its own service_key and client_info, and then calculating and comparing mac.

Implication:

service do NOT need to rely on any other dependency for verification.
no other service, other than the server, can perform the verification.

Some notes:

This is used in FB for c2s authentication (see paper section 4.1 for details)
The symmetric key is similar to Kerberos' one way authentication (diff: "signed sessions", which functions similar to Kerberos' "tickets" are generated deterministically instead of a random string. This is advantageous here because service can locally perform verification.
The idea of using the output of a PRF as a key to another PRF evaluation to produce a credential is similar to Macaroons.

benclmnt / papers

Scaling Backend Authentication At Facebook (Lewi, RWC 2018) #14

Problem

Solution

Public-Key variant

Symmetric Key variant, i.e. Crypto Auth Token (CATs)