application-research / autoretrieve

A server to make GraphSync data accessible on IPFS
22 stars 7 forks source link

When doing multiple retrievels at once, all retrievals fail after the first one succeeds. #3

Closed hannahhoward closed 2 years ago

hannahhoward commented 2 years ago

In our testing, we observe the following:

  1. When we do a single retrieval with a miner, it finishes
  2. When we do multiple retrievals with the same miner at once, one retrieval fails, but then all other retrievals fail shortly afterwards. Automatic data transfer restarts do not seem to solve the issue.

Inspecting the logs, we see the GraphSync stream was reset:

2022-01-27T01:52:23.174Z    [34mINFO[0m graphsync   impl/graphsync.go:468   Graphsync ReceiveError from 12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx: stream reset

and also on autoretrieve:

2022-01-27T01:52:23.104Z    DEBUG  filclient   filclient@v0.0.0-20220126072655-8af37db4973c/filclient.go:1256  retrieval event {"dealID": "134345", "rootCid": "bafybeihpbyg4xizni7voackqxomab6444fhwl2gkk7a2tmuphyqg5ip6mq", "miner": "f08403", "name": "ReceiveDataError", "code": 27, "message": "stream reset", "blocksIndex": 10994, "totalReceived": 2291825565}

Miner logs show the miner attempts several times to re-establish the stream unsuccessfully (this is just reattempting to connect, not yet restarting the data transfer itself):

2022-01-27T01:52:28.179Z    INFO   graphsync   messagequeue/messagequeue.go:219    cant open message sender to peer 12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx: failed to dial 12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx:
2022-01-27T01:52:28.180Z    INFO   graphsync   messagequeue/messagequeue.go:219    cant open message sender to peer 12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx: failed to dial 12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx:
2022-01-27T01:52:28.180Z    INFO   graphsync   messagequeue/messagequeue.go:219    cant open message sender to peer 12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx: failed to dial 12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx:
2022-01-27T01:52:28.180Z    INFO   graphsync   messagequeue/messagequeue.go:219    cant open message sender to peer 12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx: failed to dial 12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx:
2022-01-27T01:52:28.180Z    INFO   graphsync   messagequeue/messagequeue.go:219    cant open message sender to peer 12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx: failed to dial 12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx:

Miner logs show at the same time, multiple other streams between the miner and autoretrieve:

2022-01-27T01:52:29.306Z    INFO   net/identify    identify/id.go:372  failed negotiate identify protocol with peer    {"peer": "12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx", "error": "stream reset"}
2022-01-27T01:52:31.175Z    INFO   net/identify    identify/id.go:372  failed negotiate identify protocol with peer    {"peer": "12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx", "error": "stream reset"}
2022-01-27T01:52:33.785Z    INFO   net/identify    identify/id.go:372  failed negotiate identify protocol with peer    {"peer": "12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx", "error": "stream reset"}
2022-01-27T01:52:35.349Z    INFO   net/identify    identify/id.go:372  failed negotiate identify protocol with peer    {"peer": "12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx", "error": "stream reset"}
2022-01-27T01:52:37.225Z    INFO   net/identify    identify/id.go:372  failed negotiate identify protocol with peer    {"peer": "12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx", "error": "stream reset"}
2022-01-27T01:52:37.935Z    INFO   net/identify    identify/id.go:372  failed negotiate identify protocol with peer    {"peer": "12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx", "error": "stream reset"}
2022-01-27T01:52:39.002Z    INFO   net/identify    identify/id.go:372  failed negotiate identify protocol with peer    {"peer": "12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx", "error": "stream reset"}
2022-01-27T01:52:39.866Z    INFO   net/identify    identify/id.go:372  failed negotiate identify protocol with peer    {"peer": "12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx", "error": "stream reset"}
2022-01-27T01:52:40.507Z    INFO   net/identify    identify/id.go:372  failed negotiate identify protocol with peer    {"peer": "12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx", "error": "stream reset"}
2022-01-27T01:52:41.069Z    INFO   net/identify    identify/id.go:372  failed negotiate identify protocol with peer    {"peer": "12D3KooWNTiHg8eQsTRx8XV7TiJbq3379EgwG6Mo3V3MdwAfThsx", "error": "stream reset"}

Additional notes:

2022-01-27T01:52:29.236Z    ERROR  autoretrieve    metrics/basic.go:52 Failed to query miner f08403 for Qme5eFv4f1SY7WyuQ3gG6CPDR2cY68JVZjjdweU4Bgi1Mu (root QmezTdYeKyjPFoREStJQQbvATUP8yRJdHMMZx2rZ86p9g9): failed to open stream to peer: peer:12D3KooWBwUERBhJPtZ7hg5N3q1DesvJ67xx9RLdSaStBz9Y6Ny8: resource limit exceeded
vyzo commented 2 years ago

The resource limit exceeded error you are getting is the peer stream limit getting exceeded.

Of the top of my head, three things could be happening:

vyzo commented 2 years ago

Note: the default peer stream limit is 1024, with up to 512 inbound and up to 1024 outbound.