ethereum / portal-network-specs

Official repository for specifications for the Portal Network
313 stars 85 forks source link

Client Implementation - Offer Queues vs Thundering Herds #282

Closed pipermerriam closed 5 months ago

pipermerriam commented 8 months ago

What is wrong

During gossip of new content, clients will experience a small thundering herd of offers for the same content key.

A naive implementation may end up concurrently accepting the same content from multiple peers. Measurements in the trin client suggest that as many as 1/3 of incoming UTP streams may be redundant.

A simple attempt to fix this would be to queue incoming offers such that the first offer gets accepted and subsequent offers for the same content would be queued such that they can be rejected if the main transfer succeeds, or alternatively, if the transfer fails, one of the other offers could be accepted. This approach is problematic because those OFFER messages will timeout after a time.

The problem that needs to be solved here is to deal with the case of malicious actors taking advantage of this optimization to disrupt gossip. A malicious actor could write client code that is optimized to quickly offer new content to nodes in order to be first in line in the offer queue. They could then slow the UTP transfer rate down to draw out the time it takes to transfer the data and then fail the transfer just before it completes. The receiving node would then need to fall back onto one of the other hopefully legitimate offers.

How can this be fixed?

The malicious actor would be able to ensure that all of the offers in the queue have timed out, which means that defense against this probably involves tracking that a node recently offer'd content, and then we retrieve that content using a FINDCONTENT request to them.

This solution should work great for history network content.

This solution is problematic for state network content due to the OFFER and FINDCONTENT payloads differing. In order for this solution to work in state network, the content keys would need to contain enough information for the receiving node to reconstruct the proof since it won't have a reference to the block hash that the proof is anchor under...

pipermerriam commented 5 months ago

Closing this as there's nothing actionable at this stage.