Closed NullHypothesis closed 1 year ago
Looks like a good start. Initial thoughts:
The attester
interface seems generally useful. Could land it separately to reduce the size of this PR.
In addition to reporting update failures, the leader should probably drop workers from its list if it can't update them, to handle instances that have failed. We probably also want some guards against stale nodes, especially since key rotation might happen no more than every few weeks. Maybe workers should re-register periodically, as a keep-alive against expiry from the leader's list. Likewise, workers should check their key material against the leader periodically and terminate if they haven't received an update. Maybe that could be combined into some sort of keep-alive ping?
Quick summary of where we are with the PR:
What remains to be done:
Also, the scripts/ directory contains a few shell scripts that help with testing key synchronization locally.
Are there advantages to having separate registration and heartbeat endpoints? If the initial registration request contained an empty body (or a hash of null key material), the leader could use the same logic to schedule a key exchange. When workers send subsequent registration requests with a current key hash, that could work the same as a heartbeat, updating the leader's list of workers without triggering an immediate keysync.
Are there advantages to having separate registration and heartbeat endpoints? If the initial registration request contained an empty body (or a hash of null key material), the leader could use the same logic to schedule a key exchange. When workers send subsequent registration requests with a current key hash, that could work the same as a heartbeat, updating the leader's list of workers without triggering an immediate keysync.
That's a good observation. Fixed in 697f033.
@kdenhartog IIUC, all your remarks should now be addressed. Let me know if there's anything else that requires clarification!
Removing the "work-in-progress" because it no longer is. (cc @rillian, @DJAndries)
Let's merge and address future issues in subsequent PRs.
Resolves #10