Closed hannahhoward closed 2 years ago
The resource limit exceeded error you are getting is the peer stream limit getting exceeded.
Of the top of my head, three things could be happening:
Note: the default peer stream limit is 1024, with up to 512 inbound and up to 1024 outbound.
In our testing, we observe the following:
Inspecting the logs, we see the GraphSync stream was reset:
and also on autoretrieve:
Miner logs show the miner attempts several times to re-establish the stream unsuccessfully (this is just reattempting to connect, not yet restarting the data transfer itself):
Miner logs show at the same time, multiple other streams between the miner and autoretrieve:
Additional notes:
The connection between the miner and autoretrieve should have Protect called on it, once per transfer, with a different tag for each transfer. When the first transfer finishes, one of these tags is removed, but the other two should still be there.
One additional temporary stream is established at the end of the transfer in order for the miner to ack to autoretrieve that it's finished the transfer. The message is sent from the miner successfully, but does not appear to be received on the client. It appears sending this message happens right before the failure.
Autoretrieve is running the FullRT DHT Client ((https://github.com/libp2p/go-libp2p-kad-dht/tree/master/fullrt) which crawls the network to establish a DHT index. As a result, autoretrieve is connected to a LOT of peers. We actually did not expect to get any bitswap requests on autoretrieve till we started publishing records in the DHT, but actually, we get lots presumably cause we're in the swarms of every peer we connect to when building a DHT index. This presents a potential issue for bitswap chatter should FullRT be deployed widely.