Closed Angelo-gh3990 closed 5 months ago
So I found out that some retrievals work, but the one's that don't work start downloading and then give an timeout error like mentioned above... an error why the output file is not saved would be helpful, because you see it downloading.
Would you be able to get miner logs after ./lotus-miner log set-level --system dagstore debug
?
Also lotus-miner pprof goroutines > goroutines.txt
while the transfer is running
Also, is your dagstore initialized? (see ./lotus-miner dagstore list-shards
, all shards should be in ShardStateAvailable)
when I run : lotus-miner dagstore list-shards
All are in "ShardStateAvailable"
Some are in "ShardStateServing"
And some in "ShardStateNew"
When I run : lotus-miner dagstore initialize-all --concurrency=1
, it doesn't try to initialize more, so I assume the "ShardStateNew" will transition into "ShardStateAvailable" on it's own someday / somehow ? idk
I'll send you the logg's via slack as a DM, just waiting for the retrieval to timeout (40min)
@magik6k , thanks a million for looking into this
Thanks, looked at the goroutines, and they seem to point at something weird with sector locks.
Would you be able to reproduce this with https://github.com/filecoin-project/lotus/pull/7735 applied, and then get output of lotus-miner storage locks
?
Building now, I will send you the info in slack later today. Thanks @magik6k
I'm getting :
Before retrieval :
2976 PreCommit2 sealed(R:1) cache(W)
During retrieval :
2976 PreCommit2 sealed(R:1) cache(W)
During time-out wait (5min) :
2976 PreCommit2 sealed(R:1) cache(W)
2986 PreCommit2 sealed(R:1) cache(W)
During time-out wait (25min) : -- no sectors seen
After 40min time-out :
2987 AddPiece unsealed(W)
After stopping (ctrl-c) the retrieval :
2987 AddPiece unsealed(W)
Can you try with https://github.com/filecoin-project/lotus/pull/7737 applied?
@magik6k , yes I can in a couple of hours. Thanks
@magik6k , I tried #7737 , but the same result... Download stops after while. Any logging that you need?
with the new v1.13.2-rc2 I get the following when I cancel after timeout :
2021-12-08T23:20:05.804+0100 INFO graphsync impl/graphsync.go:468 Graphsync ReceiveError from 12D3KooWQnQNnxvKgReSftX9o7LAWsq8FHUFtB3kxEKH6rqvmDuu: stream reset
2021-12-08T23:20:05.804+0100 INFO graphsync messagequeue/messagequeue.go:252 graphsync send error: stream reset
2021-12-08T23:20:05.805+0100 INFO graphsync messagequeue/messagequeue.go:219 cant open message sender
2021-12-08T23:20:05.808+0100 INFO graphsync responsemanager/server.go:230 response failed: %!w(queryexecutor.errorString=network error)
Is it still a issue for you on v1.13.2-rc4 ? @Angelo-gh3990
@Reiers yes :-)
I just tried some other ones with filc and still having the same issue on rc4.
If this still happens on latest master, would you be able to provide goroutines again, like in https://github.com/filecoin-project/lotus/issues/7732#issuecomment-985349585?
@magik6k I send you a DM, test on version v1.13.2-rc5 is still the same. I'll test later with the master version if needed?
@magik6k @jennijuju @Reiers ok so updated to v1.13.2-rc7 updated filc to latest master ran : sysctl -w net.core.rmem_max=2500000 for bigger buffer started download, download is insanely fast ... after a while (several GB's) it slow's down a lot but keeps going Download was successful this time .... yay :-) :-) Maybe more testing is needed, but look promising :-)
Did some tests on on rc7, using a couple vm´s on gcloud for filc: Retrieving to US-Central1 from Norway:
./filc retrieve --no-ipfs=true --miner=f023467 uAXESIBrWesax1xXLKARn_ped66Q58PRUHGXkNc0Xp9kSfNFA
08:24:38 INFO Querying FIL retrieval candidates...
2022/01/06 08:24:38 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
08:24:41 INFO Got back 1 retrieval query results of a total of 1 candidates
08:24:41 INFO Attempting FIL retrieval with miner f023467 from root CID bafyreia22z5mnmoxcxfsqbdh72lz325ehhypiva4mxsdltixu7mre7gria (0 FIL)
08:25:27 INFO FIL retrieval succeeded
RETRIEVAL STATS (FIL)
-----
Size: 925308135 (882 MiB)
Duration: 45.089918131s
Average Speed: 20521397 (20 MiB/s)
Ask Price: 0 (0 FIL)
Total Payment: 0 (0 FIL)
Num Payments: 0
Peer: 12D3KooWLVyjYwoWCPYmvnCkAzwmYBmPAR2zb7g1DT3eALWU2twY
unknown node type
Retrieving to Europe-North(Finland) from Norway:
./filc retrieve --no-ipfs=true --miner=f023467 uAXESIBrWesax1xXLKARn_ped66Q58PRUHGXkNc0Xp9kSfNFA
2022/01/06 08:44:19 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
08:44:20 INFO Querying FIL retrieval candidates...
08:44:20 INFO Got back 1 retrieval query results of a total of 1 candidates
08:44:20 INFO Attempting FIL retrieval with miner f023467 from root CID bafyreia22z5mnmoxcxfsqbdh72lz325ehhypiva4mxsdltixu7mre7gria (0 FIL)
08:44:57 INFO FIL retrieval succeeded
RETRIEVAL STATS (FIL)
-----
Size: 520689159 (497 MiB)
Duration: 8.970264838s
Average Speed: 58046130 (55 MiB/s)
Ask Price: 0 (0 FIL)
Total Payment: 0 (0 FIL)
Num Payments: 0
Peer: 12D3KooWLVyjYwoWCPYmvnCkAzwmYBmPAR2zb7g1DT3eALWU2twY
unknown node type
I did not have any success when I tried to retrieve using PieceCID baga..
, they just timed out. And also, I had even better performance (average speed), when trying to retrieve just the root CID.
@rjan90 A quick test here in the EU from google pulling data from my dc in Brussels
./filc retrieve --no-ipfs=true --miner=f0xxxxx mAXCg5AIgzmWXB+M4hE8NojRdaMK/lGivppRvBzA4CabNRipbnu0
05:11:22 INFO Querying FIL retrieval candidates...
05:11:23 INFO Got back 1 retrieval query results of a total of 1 candidates
05:11:23 INFO Attempting FIL retrieval with miner f0xxxxx from root CID bafykbzacedhglfyh4m4iitynui2f22gcx8kgrl5gsrxpymbybgtm2rrklopo2 (0 FIL)
05:15:04 INFO FIL retrieval succeeded
RETRIEVAL STATS (FIL)
-----
Size: 21475902791 (20 GiB)
Duration: 3m40.932077383s
Average Speed: 97205906 (93 MiB/s)
Ask Price: 0 (0 FIL)
Total Payment: 0 (0 FIL)
Num Payments: 0
Peer: 12D3KooWAkfsqw9HxWHcNhtFSmqeVcVbibdqaV7sHURYxYyyoMkT
@rjan90 I forgot, did you set the bigger buffer size ? https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size sysctl -w net.core.rmem_max=2500000
@rjan90 I forgot, did you set the bigger buffer size ?
@Angelo-gh3990 No I forgot that! Will try a new run, might speed things up even more!
@Angelo-gh3990, how do you guys run the node with quic? When I add a quic address logs say that there is no transport registered for such address.
Also, quic by default is only in libp2p 0.16, while lotus(0.13.2-rcx) relies on 0.15. The line https://github.com/filecoin-project/lotus/blob/v1.13.2-rc7/node/modules/lp2p/transport.go#L12 is not used anywhere, thus I get the no transport registered error. Is this intended btw cc @magik6k ?
Hey @Angelo-gh3990
Just checking in. Any improvement on the newer Lotus releases?
Many thanks!
Closing this ticket as the Legacy Lotus/Lotus-Miner Markets sub-system reached EOL at the end of the 31st January 2023, and is being completely removed from the code here:
Checklist
Latest release
, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.Lotus component
Lotus Version
Describe the Bug
I get a timeout when trying to do a retrieve of a deal: (it does not gets saved)
/filclient/filc# ./filc retrieve --no-ipfs=true --miner=f000000 mAghSIFPWgF2bFIg/oIZEPypBRg8mjRSqdNtRrJKbNTMqX1dy -o /raid1-data/filclient/filc/
21:52:03 INFO Querying FIL retrieval candidates... 21:52:04 INFO Got back 1 retrieval query results of a total of 1 candidates 21:52:04 INFO Attempting FIL retrieval with miner f000000 from root CID bafybeict22af3gyura72bvseh4vecrqpe9grjktu3ni2zeu3guzsux2xoi (0 FIL) 5440040661 (5.1 GiB))
22:34:07 ERROR 12D3KooWQnQNnyvKgReSftX4o7LAWsq8FHUFtB3kxEKH6rqvmDuu-12D3KooWAkfsqw9HxWHcNhtFSmqeVcVmopdqaV7sHURYxYyyoMkT-1638478323675277129: closing data-transfer channel: 12D3KooWQnQNnyvKgReSftX4o7LAWsq8FHUFtB3kxEKH6rqvmDuu-12D3KooWAkfsqw9HxWHcNhtFSmqeVcVmopdqaV7sHURYxYyyoMkT-1638478323675277129: timed out waiting 40m0s for Complete message from remote peer
Logging Information
Repo Steps