Closed rjan90 closed 3 years ago
After experiencing another batch of deals stuck in "StorageDealsTransferring"-state with the same client, I think we have pin-pointed to at least one reason why this happens.
I restarted both my lotus daemon/miner due to some maintenance, and when I got the next storage-deal proposal after restarting my miner, the deal got stuck in "StorageDealsTransferring". To solve the problem the client also restarted his lotus-process, and then the transfers started.
So steps to reproduce:
Thank you for the detailed bug report. I believe this will be fixed by https://github.com/filecoin-project/lotus/pull/5210
I want to write some comprehensive tests around restarts to make sure we've covered all cases before merging this fix so it may take a few more days.
Describe the bug A couple of miners experienced that deals with the same client suddenly started to stop in "StorageDealsTransferring" with 0 bytes transferred, after successfully sealing deals from the client over many days. This issue is based on all the feedback and logs from both miners and the client in the Slack-thread https://filecoinproject.slack.com/archives/C01AZP8BKRQ/p1609164686470600
On the miner side
Data-Transfers shows that the transfer is ongoing, but with 0 bytes transferred:
My miner has a connection with the client:
If I try to restart the channel I can see in the logs that the channel has timed out:
On the client-side
Client Deals
Dec 28 21:47:07 bafyreigpl5vm64hhowjc2g5e7reppu6hngme5ff4lkaa3ykd6h2r5zveza 0 f023467 StorageDealStartDataTransfer N N baga6ea4seaqg3tsrp56bchrhumenu4cjnl57cpxudfiggettmlejqggtwf74coa 15.88 GiB 0 FIL 1052529 true
Client Transfers
426 Requested ...EqS7UtJw ...peoy6mfa Y 0B ...7reppu6hngme5ff4lkaa3ykd6h2r5zveza"}}
The client checked his network bandwidth and didn't have any problems with wget or scp to another machine. But there are no clear lotus transfers in nethogs.
Client also checked his socket limit:
Possible problems might be:
To Reproduce Steps to reproduce the behavior:
Version (run
lotus version
): Daemon: 1.4.0+git.e9989d0e4+api1.0.0 Local: lotus version 1.4.0+git.e9989d0e4Additional context Ongoing discussion is also happening in the slack-channel. And this issue also seems to be related to #4946 & #5211