Revoking a yubikey within a cluster

michael-lazar commented 7 years ago

I have a cluster of validation servers and I would like to enabled/disable individual yubikeys cluster-wide. I found the /revoke endpoint, but that appears to only change the active flag on the local database (it doesn't use the synchonization queue). I would have to hit /revoke on every server in the pool. If a server happened to be offline when that request was made, there doesn't appear to be a way to maintain integrity across the cluster. The normal /sync calls do not include this field.

I also discovered that the /resync command skips synchronizing keys that have the active flag set to false. So if I follow step 10 described here to add a new server to the pool, there will be a data mismatch if I try to verify an inactive yubikey on the new server.

Do you have any recommendations for how I should handle this? It looks like the active/inactive flag is a partially implemented feature that hasn't been thought out for clustered setups. Or maybe I'm just misunderstanding its purpose?

klali commented 7 years ago

I'd say that your last statement is quite true, active isn't implemented enough to be usable with only this project. Yubico used to run a service for revoking keys in YubiCloud, that had it's own queue of revocations in case a server was down.

As the sync protocol looks right now the active flag isn't transmittable over it at all. I guess the easiest way to give something to handle new servers would be to let resync hit the revoke endpoint for inactive identities.

michael-lazar commented 7 years ago

Here's what I ended up doing in case anybody stumbles across this issue:

I couldn't come up with a robust + secure way to use /revoke. I didn't like that I had to rely on an IP whitelist for access instead of an API key with more fine-grained permission controls. I also ran into some edge-cases trying to implement synchronization due to the simplistic error handling in the /revoke endpoint.

I decided to go in another direction and approached this from the KSM level. I'm already storing my AEAD keys in a MySQL database, which is being replicated across my cluster. So I added an active flag to the AEAD table:

CREATE TABLE ykksm.aead_table (
  public_id varchar(16) NOT NULL,
  keyhandle INT NOT NULL,
  nonce BLOB(6) NOT NULL,
  aead BLOB(32) NOT NULL,
  active BOOL NOT NULL,
  PRIMARY KEY (public_id, keyhandle)
);

I added some logic to my KSM servers to check the active flag when they try to load a key handle. If it's false, the KSM will return an ERR Disabled public_id failure. This gets bubbled up in the yubikey-val server as a generic BAD_OTP response.

This allows me to sidestep both of the issues that I outlined in my previous post. I've disable the /revoke endpoint and from the validation server's perspective, every yubikey is active now. Our administrators can use the same internal tool to provision new yubikeys into our AEAD database, as well as enable/disable yubikeys if they need to be revoked.

Yubico / yubikey-val

Revoking a yubikey within a cluster #45