Closed dineshshenoy closed 3 years ago
When this issue occurs, there is a single Open
data transfer event and then no more events
2020-11-23T14:06:18.887Z DEBUG markets loggers/loggers.go:35 data transfer event {"name": "Open", "status": "Requested", "transfer ID": 923, "channel ID": "12D3KooWHAN67HgLoqrygQnouFF5uVhwM4kP7zurpR65Powtfgsa-12D3KooWMTh8Cz9dbUik9nQGKE5aVkUhVLfGLm48vu3gaJXQQBYQ-923", "sent": 0, "received": 0, "queued": 0, "received count": 0, "total size": 0, "remote peer": "12D3KooWMTh8Cz9dbUik9nQGKE5aVkUhVLfGLm48vu3gaJXQQBYQ", "event message": "", "channel message": ""}
@magik6k i don't think adding logging will solve this issue. I have a lot deals stuck in StartDataTransfer state
root@li824-220:~# lotus client deal-stats Total: 791 deals, 1.516 TiB StorageDealActive: 6 deals, 11.91 GiB StorageDealProposalAccepted: 1 deals, 1.984 GiB StorageDealSealing: 383 deals, 760 GiB StorageDealFundsReserved: 21 deals, 41.67 GiB StorageDealCheckForAcceptance: 5 deals, 9.922 GiB StorageDealStartDataTransfer: 125 deals, 237.6 GiB StorageDealTransferring: 26 deals, 51.59 GiB StorageDealClientFunding: 1 deals, 1.984 GiB StorageDealError: 223 deals, 435.6 GiB
Another node of mine has even more stuck in this state
Total: 1157 deals, 2.201 TiB StorageDealActive: 104 deals, 199.4 GiB StorageDealSealing: 125 deals, 241.1 GiB StorageDealFundsReserved: 19 deals, 34.48 GiB StorageDealStartDataTransfer: 381 deals, 740.9 GiB StorageDealTransferring: 9 deals, 17.86 GiB StorageDealError: 519 deals, 1020 GiB
@kernelogic could you please let us know which version of lotus you are running
It would also be helpful if you could copy a few of the most common error messages you are seeing
It appears that the issue is caused in part by a limit (maximum of 6) on the number of in-progress queries in go-graphsync: https://github.com/ipfs/go-graphsync/blob/785e0b3bbdd6b578e3274fb91e5963c219d372a2/responsemanager/responsemanager.go#L25
When the client wants to send files to the miner
If the client is already responding to the maximum number of requests, the new request will be queued until there is an available request worker. The deal will remain in the StorageDealStartDataTransfer
state.
I'm using the latest lotus 1.2.1
Suggestions from standup to action with three pieces: 1) Create a parameter on the client side so that they can increase the # of active deals from the current value of 6 to something larger. 2) Log an error message so that the issue is discoverable when the queue is "full" (WARN). It could trigger the client to increase the parameter in #1 or invest in more hardware or something else. 3) Double check that the deals that appear "stuck" actually become unstuck when the number of deals drops below the parameter. These shouldn't error out but wait for a transfer slot to be available. It might take more time but shouldn't really error.
I hope more deals can be transferring at the same time because some miners can be really slow due to geographical difference. For example my node's internet is good for uploading 50MB/s but a lot of the time it only doing 500KB/s while hundreds of deals are "StorageDealStartDataTransfer". I think the whole thing is clogged up with a few slow deals.
What could be the result if blindly increase it to a very large number like 100?
What can I do right now to increase the limit without waiting for an official release? I don't see this file in the lotus repo. It's on the IPFS side.
These are the common error messages:
@kernelogic
Thanks for the explanation. So it's all normal then, just congestion.
When the client creates a deal with the provider:
I believe the root cause of this issue is that the provider does not open a graphsync query to the client in step 2. This could be because
Proposed fix: In go-data-transfer, when opening a push request, set a timeout by which the provider must begin the data transfer, and fail the transfer if the timeout expires.
@dineshshenoy is this solved? Where?
Should have linked to #5211 which has the propose fix and is being worked on shortly. Forked off that issue to a separate item since this issue had gotten unwieldy and a lot of the earlier work has been complete.
Previously running: fe448e Upgraded to 1.2.0
Tried against the following miners: f09848 f014394 f015927 f02665
Textile was successful in deal transfers on fe448e but is running into challenges starting a data transfer on 1.2.0
Nov 20 02:03:21 bafyreifeu4k2lgizsdqmmqrbuyfow57qqihaeqgtaqnvk4cxfajtjnjuom 0 f09848 StorageDealStartDataTransfer N N baga6ea4seaqpi6qnbmkqf7b7psqheuidjwfxjcqrpy22hvsg3slogbcmf72wsfq 1.984 GiB 0.1045888 FIL 522944 false
Nov 20 02:03:21 bafyreicbcbjvwpimr2xgjov7t5fk4kvvobxfqb6oufozt7hqpoovtarz4y 0 f014394 StorageDealStartDataTransfer N N baga6ea4seaqpi6qnbmkqf7b7psqheuidjwfxjcqrpy22hvsg3slogbcmf72wsfq 1.984 GiB 0.1045976 FIL 522988 false
Nov 20 02:03:21 bafyreifcmj2bhm3cxw4rw22ns3wkwvjxk72njllvty2i36sumvwgeydrqq 0 f015927 StorageDealStartDataTransfer N N baga6ea4seaqpi6qnbmkqf7b7psqheuidjwfxjcqrpy22hvsg3slogbcmf72wsfq 1.984 GiB 0.0103161366 FIL 521017 false
Nov 20 02:06:21 bafyreigsjfa5ynj3v67uxnv3q5m4otvhprtka2hphiqxdj4teqak6sdf6m 0 f02665 StorageDealStartDataTransfer N N baga6ea4seaqd3w2d6namioztl4cegxnhbnvwjivzn5unxmdiegm5cuhgnu5biky 7.938 GiB 0.002090608 FIL 522652 false
Nov 20 03:03:17 bafyreibkoc54gbijomv6poceji6xrxsypy3w2ayz2vgnqwoysbyoirodam 0 f09848 StorageDealStartDataTransfer N N baga6ea4seaqo62sswgtnh7h324wwq4exqc7nwiq33i7rfn4a5whu3uyzncmdyea 1.984 GiB 0.1045648 FIL 522824 false
Nov 20 03:03:17 bafyreieys5gvo5kxa3vv2mq6fzslncfgvj637fisipelcn4to422nqqp2y 0 f014394 StorageDealStartDataTransfer N N baga6ea4seaqo62sswgtnh7h324wwq4exqc7nwiq33i7rfn4a5whu3uyzncmdyea 1.984 GiB 0.1045736 FIL 522868 false
Nov 20 03:03:18 bafyreignxfktgmaa66ypnqr75mreh6jv7wmgy35bcz7acmnbas4jj4kyda 0 f015927 StorageDealStartDataTransfer N N baga6ea4seaqo62sswgtnh7h324wwq4exqc7nwiq33i7rfn4a5whu3uyzncmdyea 1.984 GiB 0.0103137606 FIL 520897 false
Nov 20 03:06:21 bafyreicpnmoc2iee3zb5523rolfthkkw5xvwkwuyjvv4ubog3ckxla5roe 0 f02665 StorageDealStartDataTransfer N N baga6ea4seaqnhd5xfeuwcrndmeb5xhuum6cqfhprlxzdcetgqqjnbqrgey6mqca 7.938 GiB 0.002090128 FIL 522532 false
Nov 20 04:03:22 bafyreigqr6ifsqwycag3yn2xzicojnnvbt57w2xnwtzidgkmomqmkowv5m 0 f09848 StorageDealStartDataTransfer N N baga6ea4seaqnh3pbmml4o44nmuzb4tp3t4gv7kve2sugf4cb5yrsatshzmn2eki 1.984 GiB 0.1045408 FIL 522704 false