Closed dankrad closed 11 months ago
This is interesting, but it has significant complexity and is not comprehensive. For example:
If teams are looking at preventing concurrent signing I would first be looking at preventing it at the keys. For example, using lock-files next to keystores, remote signers with built-in slashing prevention (concurrent access is fine here) or hardware wallets that natively prevent concurrent access (don't exist yet).
Every time you introduce a complex scheme "above the keys" you also add risk of a false-positive that might cause downtime. I'm not saying this scheme isn't worth considering, but I think we need to approach higher-level (than the keys) schemes with caution.
EDIT: I just learned this was discussed on the call last night (I wasn't present) so my response is lacking some context, apologies.
Looking at this some more, another approach would be to wait n
epochs and only start signing messages if no other message has been seen. This approach has the following differences to @dankrad's approach.
n
epochs on reboot. n
a little it also protects (sometimes) against starting two VCs at the same time.(2) is the clear trade-off for the simplicity. I'm not saying this is a perfect solution but it's exploring the space.
I like the idea because it also protects against double starts.
Note that I already tuned my system so that it does not require any extra history capabilities by the BN, since I am only referencing the last attestation (which needs to be kept for LMD Ghost already).
(which needs to be kept for LMD Ghost already)
Whilst we maintain information about that attestation for LMD Ghost we don't actually maintain the AttestationData
object.
For Lighthouse all we can recall (without additional history capabilities) is the latest attestation.data.beacon_block_root
and attestation.data.target.epoch
for each validator.
EDIT: If those two values are identical then (if there are no client bugs) everything else in the AttestationData
should be identical as well...
For Lighthouse all we can recall (without additional history capabilities) is the latest
attestation.data.beacon_block_root
andattestation.data.target.epoch
for each validator.
Same for Prysm
Prysm also has functionality in place to check objects against the slasher server to assess whether or not a given object is going to produce a slashable condition before signing.
This, combined with a validator's local history of signed objects, seems to provide reasonable protections against slashings. However, it assumes that the slasher is synced and working properly and that the user has enabled these features. Currently the external validation (querying the slasher) is not enabled by default. The local storage of signed objects is enabled by default.
Querying the slasher also protects against potential clock skews. If someone were to set you system clock back by one hour, then your validator might repeat the last hour's worth of events which could very likely cause a double sign slashing.
In any case, there is a possible race condition in the validators.
Assume the validators clocks are perfectly in sync +/- 1ms
For step 3, if both validators are talking to the same beacon node, you're probably OK but still at a high risk. If not, then this doesn't solve the problem.
For Lighthouse all we can recall (without additional history capabilities) is the latest
attestation.data.beacon_block_root
andattestation.data.target.epoch
for each validator.
Yeah, so I think checking these two against the local validator store of the latest signed message would be pretty valuable already.
In any case, there is a possible race condition in the validators.
I am aware of this race condition, and it's unavoidable. I just want to catch those cases which we can easily catch in the current framework, which might already prevent of a lot of user error slashings.
This has been implemented in all clients as doppelganger detection (nimbus guide as example https://nimbus.guide/doppelganger-detection.html)
One route is standardised for it:
Starting a validator client for the same validator twice is of course a user error. However, it's probably the most common user error that leads to slashing. If the to clients are started at the same time, there is no way to catch this error; however, in all other cases, the second client being started can detect that the last attestation made by the validator does not match its local storage, and refuse operation (without an override flag).
Suggested solution: