ethereum / eth2.0-pm

ETH2.0 project management
Other
260 stars 75 forks source link

sharding attestation policing across validators/nodes #63

Closed djrtwo closed 2 years ago

djrtwo commented 5 years ago

sharding attestation policing across validators/nodes

For the network to be well-policed wrt attestation slashings, we need to actively be checking for slashable attestations within the "weak subjectivity period". The weak subjectivity period is variable in eth2 due to the validator queuing mechanism, but it is on the order of ~8 months when 10M eth (300k validators) are participating.

This brings up the non-trivial challenge of efficiently storing attestations and related data, and searching through this space on demand when new (and potentially historic) attestations come in.

After some consideration on the load of Attestations and search data that needs to be stored, it seems prudent to default shard this responsibility across nodes/validators rather than have each validator attempt to store and search the entire space. It is also assumed that large data consumers (block explorers, API providers, etc) will store and search the entire space.

This issue discusses both the efficient search algorithms, along with a discussion on to splitting up the search space.

Slashing search

There are two types of slashable attestations -- "double vote" and "surround vote". Double votes are relatively simple to find, whereas surround takes a little more consideration.

The strategies discussed here attempt minimize database reads for attestations when searching and instead rely upon relevant bitfields and epochs related to each validator. If a slashing is detected, then a more intensive find for the particular attestation from the DB is warranted.

Double vote

Detecting a double vote is relatively simple and requires storing only N bits per validators where N is the number of epochs in the weak subjectivity period. When a new attestation comes in for epoch, for each validator in the attestation, check if double_vote bit for validator at epoch is set to 1. If set to 1, found slashable message, retrieve from DB. If set to 0, set to 1.

This is a constant time search and constant time update per validator so linear in the number of validators in an attestation.

Surround vote

Detecting a surround vote is a bit more complex and requires more data. An algorithm developed by @vbuterin requires 2*N*log(N) bits per validator. The search is constant time per validator and the data update is linear in the worst case but normally likely sub-linear, near constant time update.

The algorithm requires storing 2*N epochs per validator. The basic principle is the following:

For each epoch E and for each validator, maintain the highest target epoch of an attestation whose source <= E. This creates an always-sorted list that can be updated by starting from the attestation's source position in the array and then continuing to move right until no longer adjusting the maximum. When a new attestations is received, check it against that list for each validator in the attestation to find if the new attestation is surrounded by any prior attestation. To check if the new attestation surrounds an earlier attestation, maintain the opposite of the above described array and perform the opposite checks.

Sharding responsibility

Although these algorithms are very efficient in search time, the data requirement for maintaining these lists can actually become quite large in addition to the already large data requirement of storing historic attestations through the weak subjectivity period, thus the need to shard responsibility of the attestation slashing search spec across validators.

We recommend strong client defaults in this matter to (1) provide sufficient policing of the network in the case that the 2/3 honesty assumption fails wrt safety while at the same time (2) do not place excessive resource requirements on any one validator to do so.

In addition to consumer node policing, we fully expect large data users to execute policing of the full dataset as they already will be storing it for other purposes. Fortunately, policing of attestations works if a minimum of one honest user sufficiently monitors the entire weak subjectivity period worth of attestations.

There are multiple options for strong user defaults, the most appealing of which scales as the number of validators working off of a node, but as most BN vs VC architecture does not persistently store which validators are connected to a BN, this might require additional tracking that is not currently in place. To this end, a BN could catalogue for which pubkeys validator attestation creation requests have been made and scale responsibility that way. If the user default goes in this direction, it likely makes sense to not broadcast slashable attestations immediately when found and instead to package them into blocks eventually for the one of the associated pubkeys connected to the BN and for the profitability of the validator.

An alternative is to have a slightly higher per-node default epoch range that a node is responsible for and disregard which validators might be connected. In this case, it likely makes sense to instead immediately broadcast slashable messages to be picked up by any validator to be put into a block. Due to the node not persistently knowing if validators are connected to it, a slashable message might be kept around locally for an indefinite period of time and not serve value to the network.

djrtwo commented 5 years ago

You can actually construct a much less data intensive no-surround checker that has the potential of false positives. Essentially, use the mechanism proposed by vitalik, but instead of doing it per validator, just do it for all attestations. When a historic attestation comes in, check it against this mechanism. If you find a surround, do a deeper search in the database to find if the surrounded attestation has a collision in participating validators. If so, found slashable surround. If not, false positive.

This operates under two assumptions

  1. It is expensive to construct valid attestations (each validator can only produce max 1 per epoch before they are slashable) so is expensive to dos
  2. Due to the way in which attestations are constructed against valid states, it is abnormal for surrounded attestations to naturally occur

Thus a valid, historic attestation that surrounds anything is already a red flag.


A note on the data requirements of the non-false-positive surround checker in the original post

For 8 months of epochs, this is ~1MB per validator (54k epochs 2 8-bytes-per-epoch) which is something like 300GB of surround meta-data if 300k validators. Can actually truncate the 8-bytes-per-epoch to just log(epochs_per_weak_subjectivity_period) ~= 2-bytes to reduce the load to 75GB of surround meta-data, but this still seems on the high side for simple search meta-data

protolambda commented 5 years ago

See https://github.com/protolambda/eth2-surround for more options/ideas to implement surround matching with. With the last option, using manhatten distances, you end up only spending 1 bit per validator per manhatten index. So that would be (54000+3150)*300000 bits = ~2.14 GB. Then add a simple index which tells which attestations have been seen to match a given manhatten index (1 best case, ~2 average case with reasonable finality) to look up the actual attestation with. And things could be faster with a few layered indices (batches of validators, instead of per-validator). So that would be approx. 2.2 - 2.5 GB for 8 months of matching data. (3.3... % of the previous optimized 75GB case)

shayzluf commented 5 years ago

@protolambda still we have to store the attestations themselves in order to create a slashing container ? 54000 2344 committees per epoch 328 bytes ~ 41gb using aggregated attestations

protolambda commented 5 years ago

@shayzluf yes, that is the minimal requirement for eventual use. But 99% of the time, you likely don't use it. So ideally you put the old attestations into cheap (and possibly distributed/decentralized) storage, and only deal with the data that enables detecting the slashable events (the 2-3GB mentioned earlier) as efficiently as possible. (And this may be with some error rate, it's more than fine to look up an underlying attestation some small X% of the time, if detection is not 100% strict).

shayzluf commented 5 years ago

@protolambda While we were implementing the recommended double vote detection method using a bitlist, we faced an issue where aggregated attestations with overlapping indices triggered a false positive for a slashable offense. A single bit per validator doesn't seem like enough to prevent this false positive. Do you have any suggestions on preventing the false positives for this case from occurring?

For example: Say we have a setup with 2 nodes, and 3 validator sets A, B, C. If we receive one aggregated attestation of set A + B, and then receive an aggregated attestation for the same data but of B + C, the bitlist per validator method would report the second attestation as a slashable offense for all validators in set B.

djrtwo commented 5 years ago

So what we want to do is cheaply decide if it is worth it to do a more expensive DB lookup. In such a case, we can tolerate some false positives (if they aren't cheap to construct), but don't want any false negatives.

Unfortunately even without the above aggregation exploit, you could just send a full duplicate attestation (A in database, and A comes in on the wire sometime in the future) and it would trigger the false positive. This is true as long as we don't keep any data easily accessible about the attestation. Add the set of 32-byte hashes associated with the various AttestationData each epoch and mapping them to indices could help solve this at the cost of 32 times the diversity of AttestationData per epoch (assume min of 1, max of 10 or so). In the case above (A+B and B+C), we would have the associated hash of AttestationData for the original A+B and see that B isn't at fault for B+C.

At the end of the day, this and similar methods don't fully solve the problem. There is a DoS issue prior to this check. Step 1 is to figure out which attesters are even a part of the attestation and if the signature is valid. If an historic attestation comes in from the wire, the only way to find out this information is to pull up the historic BeaconState context and check the attestation against the historic shuffling (active validator set and seeds of randomness). This is an expensive DB read that could also be associated with expensive computation depending upon how frequently historic states are stored in the DB.

This becomes a major DoS vector as an attacker can even send invalid attestations to a policing node for free (not slashable) causing the node to do expensive historic lookups.

Still more thought to be done on this :/

Personal notes that might be useful

Goal:

Load:

Exploits:

Needs:

Questions:

protolambda commented 4 years ago

New progress on surround vote matching: https://github.com/protolambda/eth2-surround#min-max-surround The min-max tracking should be sufficient for slashings, and although the naive thing takes 64GB, there is a lot of repetitive data which I think allows for compression to the 1-2 GB range :)

mkalinin commented 4 years ago

There is a compilation of recent approaches and ideas around this topic that I've recently made -- https://hackmd.io/@sYlY_LZpQIGgFmhdv6vV-A/By897a5sH.

This write up contains a little bit of new ideas/approaches and design overview featured with size evaluation of suggested indices. The document should be useful for those who are working on slasher component of beacon chain client.

Indices design overview:

greg7mdp commented 3 years ago

I have a question about the following:

Double vote Detecting a double vote is relatively simple and requires storing only N bits per validators where N is the number of epochs in the weak subjectivity period. When a new attestation comes in for epoch, for each validator in the attestation, check if double_vote bit for validator at epoch is set to 1. If set to 1, found slashable message, retrieve from DB. If set to 0, set to 1.

I think that typically most bits will be set to 1, since validators are expected to attest at every epoch.

Is this correct? If it is, we can easily lower the storage required per validator without slowing down the access time.

protolambda commented 3 years ago

@greg7mdp Hey Gregory, that is correct. However, tracking double votes is the easy light-weight part compared to detecting surround votes. And this is a bit of an old issue you responded to (more than a year before mainnet launch!). Are you writing a slasher and looking for resources?

I recommend you take a look at this great write-up about slashing detection: https://hackmd.io/@sproul/min-max-slasher The Lighthouse team did a great job implementing their slasher, you can see it being introduced here: https://github.com/sigp/lighthouse/pull/1567

greg7mdp commented 3 years ago

I had another thought. The discussion here assumes that the slasher program are responsible to check attestations for all validators, which obviously does not scale well. Another issue is that it allows DoS attacks since validating attestations require expensive database lookups.

What if instead each validator would also have the responsibility to police attestations from a group of at most 256 other specific validators? This would drastically reduce the memory/disk requirements for storing the state necessary to detect double and surround votes (256 < 300k), and would not increase with the number of validators. Also the validators could cache the data necessary to validate the signature of attestations from the 256 validators they police, and ignore attestations from validators they don't police, reducing the DoS attack vector.

greg7mdp commented 3 years ago

@protolambda thanks for the answer to my previous question. I realize this is an old comment thread. Is there a place where the discussion is still happening? The link you sent doesn't seem like the right place to discuss?

protolambda commented 3 years ago

@greg7mdp

DoS attacks are not a big concern, a slasher generally scales well and does not run costly computations like BLS signature verification (the eth2 nodes that provide the slashings do the validation). And it can indeed be sharded per group of validators. Or just track a shorter time period in memory. And this part of the network is something anyone can do at any point, DoS attacks are more of a problem when there is a single point of failure and it can be attacked selectively.

For technical chat you can visit the Eth R&D discord server (https://discord.gg/SvWfM8A5fA), the eth2-tooling may be a good sub-channel. Or I create a new one specifically for slashing if there is a larger discussion and more interest.

And are you going to write your own slashing detection, contributing to some client, or have other plans? I'm always interested to see what new projects are getting started.

greg7mdp commented 3 years ago

@protolambda I don't have any precise plans on what I'll do, I just want to learn and contribute to eth 2.0. I'm a good C++ developer but I don't think this language is very much in use in the eth community. I happened to be reading this issue and a description of the slasher designs and just thought I'd ask some questions.

About the min-max slasher description, I have not read it, but I can't help wondering if it is not a complex solution designed to scale to 300k validators, and that maybe a simpler solution (vbuterin's?) could be used if we sharded per group of validators as an integral part of the protocol.

protolambda commented 3 years ago

@greg7mdp The problem is that to selectively watch a few validators, you have to follow their moves. Validators shuffle around to different committees, in different gossip subnets to publish their attestations. At that point, you are following essentially every subnet to track the same subset of validators all the time. Tracking just what you happen to see on a select few subnets is possible too, but makes the data more sparse, more difficult to compress. There is aggregate attestation topic as well, but that is so similar to on-chain data, and easy to police, that it is not the primary concern, nor a good source of slasher-income.

So at that point it is more optimal to just track everything if you are on a lot of attestation subnets anyway, making it a centralization risk. The question is not really "can we scale it?", but more "can we do all that work on more minimal hardware"? That enables more validators to do all that policing and compete for slasher rewards. Also, please have a look at the min-max approach and the lighthouse implementation, it may take time to digest, but in the end it is simple and minimal data you deal with.

greg7mdp commented 3 years ago

Thanks @protolambda I'll have a look at the current implementation. I thought that the beacon node had to keep track of all attestations from all validators, but probably I was mistaken, I don't have a very good understanding of the architecture yet.