During gossip of new content, clients will experience a small thundering herd of offers for the same content key.
A naive implementation may end up concurrently accepting the same content from multiple peers. Measurements in the trin client suggest that as many as 1/3 of incoming UTP streams may be redundant.
A simple attempt to fix this would be to queue incoming offers such that the first offer gets accepted and subsequent offers for the same content would be queued such that they can be rejected if the main transfer succeeds, or alternatively, if the transfer fails, one of the other offers could be accepted. This approach is problematic because those OFFER messages will timeout after a time.
The problem that needs to be solved here is to deal with the case of malicious actors taking advantage of this optimization to disrupt gossip. A malicious actor could write client code that is optimized to quickly offer new content to nodes in order to be first in line in the offer queue. They could then slow the UTP transfer rate down to draw out the time it takes to transfer the data and then fail the transfer just before it completes. The receiving node would then need to fall back onto one of the other hopefully legitimate offers.
How can this be fixed?
The malicious actor would be able to ensure that all of the offers in the queue have timed out, which means that defense against this probably involves tracking that a node recently offer'd content, and then we retrieve that content using a FINDCONTENT request to them.
This solution should work great for history network content.
This solution is problematic for state network content due to the OFFER and FINDCONTENT payloads differing. In order for this solution to work in state network, the content keys would need to contain enough information for the receiving node to reconstruct the proof since it won't have a reference to the block hash that the proof is anchor under...
What is wrong
During gossip of new content, clients will experience a small thundering herd of offers for the same content key.
A naive implementation may end up concurrently accepting the same content from multiple peers. Measurements in the trin client suggest that as many as 1/3 of incoming UTP streams may be redundant.
A simple attempt to fix this would be to queue incoming offers such that the first offer gets accepted and subsequent offers for the same content would be queued such that they can be rejected if the main transfer succeeds, or alternatively, if the transfer fails, one of the other offers could be accepted. This approach is problematic because those OFFER messages will timeout after a time.
The problem that needs to be solved here is to deal with the case of malicious actors taking advantage of this optimization to disrupt gossip. A malicious actor could write client code that is optimized to quickly offer new content to nodes in order to be first in line in the offer queue. They could then slow the UTP transfer rate down to draw out the time it takes to transfer the data and then fail the transfer just before it completes. The receiving node would then need to fall back onto one of the other hopefully legitimate offers.
How can this be fixed?
The malicious actor would be able to ensure that all of the offers in the queue have timed out, which means that defense against this probably involves tracking that a node recently offer'd content, and then we retrieve that content using a FINDCONTENT request to them.
This solution should work great for history network content.
This solution is problematic for state network content due to the OFFER and FINDCONTENT payloads differing. In order for this solution to work in state network, the content keys would need to contain enough information for the receiving node to reconstruct the proof since it won't have a reference to the block hash that the proof is anchor under...