filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.83k stars 1.25k forks source link

Cannot perform any retrieval for 32GB files #5409

Closed kernelogic closed 3 years ago

kernelogic commented 3 years ago

Describe the bug I tried several retrievals to several different miners for the deals I made. None of them went through. I do not believe they all offline. I suspect it has something to do with the deal size. These are all 32GB offline deals, made with --fast-retrieval=false.

Example 1 - stuck after DealStatusOngoing Provider is doing unsealing but it takes hours without going further

lotus client retrieve --miner f024008 bafykbzaced6noeziyglm2frycaknydeijbimvhpk72eimhfupoxd72dehp2k4 abc

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew) Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance) Recv: 0 B, Paid 0 FIL, ClientEventDealAccepted (DealStatusAccepted) Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelCreateInitiated (DealStatusPaymentChannelCreating) Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelReady (DealStatusPaymentChannelAllocatingLane) Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing)

Example 2 - unmarshalling error The provider told me his miner crashed upon receiving my retrieval request

lotus client retrieve --miner f083550 bafykbzacecu4qt4tlr5vqojgtlt553rbskqremq67z37x5cufaonjqfh5y7no mysql-2016-04-19.tar.gz.partbl 2021-01-25T04:59:20.145Z WARN rpc go-jsonrpc@v0.1.2/client.go:541 unmarshaling failed {"message": "{\"Err\":\"exhausted 5 attempts but failed to open stream, err: failed to dial 12D3KooWFJ6iPAiW82pR7REB8pJfKAsoKoEAhgWKYduoQT1734a9: all dials failed\n [/ip4/127.0.0.1/tcp/45989] dial tcp4 127.0.0.1:45989: connect: connection refused\n [/ip6/::1/tcp/41337] dial tcp6 [::1]:41337: connect: connection refused\n [/ip4/59.12.56.212/tcp/45989] dial tcp4 59.12.56.212:45989: connect: connection refused\n [/ip4/59.12.56.215/tcp/45989] dial tcp4 0.0.0.0:33463-\u003e59.12.56.215:45989: i/o timeout\",\"Root\":null,\"Piece\":null,\"Size\":0,\"MinPrice\":\"\u003cnil\u003e\",\"UnsealPrice\":\"\u003cnil\u003e\",\"PaymentInterval\":0,\"PaymentIntervalIncrease\":0,\"Miner\":\"f083550\",\"MinerPeer\":{\"Address\":\"f083550\",\"ID\":\"12D3KooWFJ6iPAiW82pR7REB8pJfKAsoKoEAhgWKYduoQT1734a9\",\"PieceCID\":null}}"} ERROR: RPC client error: unmarshaling result: failed to parse big string: '"\u003cnil\u003e"'

Example 3 - incomplete response One of the provider told me his miner is set as allowUnseal=false but his worker is set as allowUnseal=true, yet the retrieval failed before the worker picked up the unseal.

lotus client retrieve --miner f064218 baga6ea4seaqgsbrsupr6az5zf2yjqai5t4xtgn7lf2zd62wnxxq2mgyhlvbxyda publicdomainmovies.tar.05

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew) Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance) Recv: 0 B, Paid 0 FIL, ClientEventUnsealPaymentRequested (DealStatusAccepted) Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelAddingFunds (DealStatusPaymentChannelAllocatingLane) Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing) Recv: 0 B, Paid 0 FIL, ClientEventPaymentRequested (DealStatusFundsNeeded) Recv: 0 B, Paid 0 FIL, ClientEventSendFunds (DealStatusSendFunds) Recv: 0 B, Paid 0.1 FIL, ClientEventPaymentSent (DealStatusOngoing) Recv: 0 B, Paid 0.1 FIL, ClientEventDataTransferError (DealStatusErrored) Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew) ERROR: retrieval failed: Retrieve: Retrieval Error: error generated by data transfer: deal data transfer failed: incomplete response

lotus client retrieve --miner f01278 bafykbzaceaj7ube4k2vhniqgdb6vq7ggcoebf4c6punse7ike2almme6esdxi mysql-2018-11-01.tar.gz.partcy

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew) Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance) Recv: 0 B, Paid 0 FIL, ClientEventUnsealPaymentRequested (DealStatusAccepted) Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelCreateInitiated (DealStatusPaymentChannelCreating) Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelReady (DealStatusPaymentChannelAllocatingLane) Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing) Recv: 0 B, Paid 0 FIL, ClientEventPaymentRequested (DealStatusFundsNeeded) Recv: 0 B, Paid 0 FIL, ClientEventSendFunds (DealStatusSendFunds) Recv: 0 B, Paid 0.000000000000000002 FIL, ClientEventPaymentSent (DealStatusOngoing) Recv: 0 B, Paid 0.000000000000000002 FIL, ClientEventDataTransferError (DealStatusErrored) Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew) ERROR: retrieval failed: Retrieve: Retrieval Error: error generated by data transfer: deal data transfer failed: incomplete response

Example 4 - miner is not accepting online retrieval deals lotus client retrieve --miner f047419 baga6ea4seaqgsbrsupr6az5zf2yjqai5t4xtgn7lf2zd62wnxxq2mgyhlvbxyda publicdomainmovies.tar.05

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew) Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance) Recv: 0 B, Paid 0 FIL, ClientEventDealRejected (DealStatusRetryLegacy) Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptanceLegacy) Recv: 0 B, Paid 0 FIL, ClientEventDealRejected (DealStatusRejected) Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew) ERROR: retrieval failed: Retrieve: Retrieval Proposal Rejected: deal rejected: miner is not accepting online retrieval deals

Example 5 - normal shutdown of state machine This provider is able to serve my retrieval for a 8GB file, but not 32GB ones.

lotus client retrieve --miner f022352 baga6ea4seaqoyyc52ajdq2v7723d3uixeem5ulwvbtv7h45776vrrg53nnf2gjq trusted_setup_phase2.tar.47

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew) Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance) Recv: 0 B, Paid 0 FIL, ClientEventDealAccepted (DealStatusAccepted) Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelAddingFunds (DealStatusPaymentChannelAllocatingLane) Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing) Recv: 0 B, Paid 0 FIL, ClientEventProviderCancelled (DealStatusCancelling) Recv: 0 B, Paid 0 FIL, ClientEventDataTransferError (DealStatusErrored) Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew) ERROR: retrieval failed: Retrieve: Retrieval Error: error generated by data transfer: unable to send cancel to channel FSM: normal shutdown of state machine

lotus version 1.4.1+git.d6c06881e

flyworker commented 3 years ago

Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew) ERROR: retrieval failed: Retrieve: Retrieval Error: error generated by data transfer: deal data transfer failed: incomplete response

dgreuel2002 commented 3 years ago

This is the error I get when running,

lotus-miner retrieval-deals list 1 (1).txt

Also, This is my miner log when the message hit and then it came to an abrupt stop!

2.txt

dirkmc commented 3 years ago

Thank you for the bug reports, it's very helpful when they're so well structured like this 👍

@dgreuel2002 I believe the crash you're seeing is coming from this line: https://github.com/filecoin-project/go-fil-markets/blob/master/retrievalmarket/impl/requestvalidation/revalidator.go#L87 Probably because deal.PricePerByte is zero. Is your miner accepting deals for free? Related: https://github.com/filecoin-project/lotus/issues/3420

@kernelogic

Example 1 - stuck after DealStatusOngoing

Is it possible there are messages stuck in the message pool preventing the deal from proceeding?

Example 2 - unmarshalling error

This sounds like a crash on the miner side, as you suggested.

We will investigate all these issues, thanks again for the detailed report.

kernelogic commented 3 years ago

@dirkmc do you mean his deal price for "retrieval" is free? How to set the retrieval price anyway? Is it always the same as storage price for unverified deals? I looked into the help of "lotus-miner storage set-ask" but there's no mention of retrieval.

USAGE: lotus-miner storage-deals set-ask [command options] [arguments...]

OPTIONS: --price PRICE Set the price of the ask for unverified deals (specified as FIL / GiB / Epoch) to PRICE. --verified-price PRICE Set the price of the ask for verified deals (specified as FIL / GiB / Epoch) to PRICE --min-piece-size SIZE Set minimum piece size (w/bit-padding, in bytes) in ask to SIZE (default: 256B) --max-piece-size SIZE Set maximum piece size (w/bit-padding, in bytes) in ask to SIZE (default: miner sector size)

dgreuel2002 commented 3 years ago

@kernelogic Kernelogic it is lotus-miner retrieval deals set-ask

@dirkmc yes I do have it set to 0 and price per byte set to .0000000000002

I have no messages stuck as the rest of miner operations are fine and local mpool is clear. the miner abruptly haults as soon as the unseal message hits it. The log I attached is the end of the kiner log as it had haulted and didnt indicate to why. Please let me know if you need more info on this. @stuberman also has this same error when he runs lotus-miner retrieval-deals list.

I am on the road but please reach out for more info if needed.

dgreuel2002 commented 3 years ago

@dirkmc @kernelogic @jennijuju Here is my get-ask lotus-miner retrieval-deals get-ask Price per Byte Unseal Price Payment Interval Payment Interval Increase 0 FIL 0.000000000000000002 FIL 1MiB 1MiB

stuberman commented 3 years ago

I am storing both sealed and unsealed sectors. I my miner logs I see attempts to retrieve deals, both from unsealed sectors and sealed sectors that all seem to fail. Two sectors transferred to my worker/sealing repo to unseal and are still sitting there idle (sectors 305 and 440). My miner logs also show an unsealed retrieval attempt (sector 330) which also fails.

Here is a sample of logs for sector 330 on miner

2021-01-25T04:53:03.039Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventPaymentReceived", "deal ID": "8", "receiver": "12D3KooWEdkgB5RQCAZToi4d3U9GSYwjLR38tqaNVtrwy5KGH81q", "state": "DealStatusUnsealing", "message": ""} 2021-01-25T04:53:03.065Z DEBUG advmgr sector-storage/sched.go:354 SCHED 1 queued; 4 open windows 2021-01-25T04:53:03.068Z DEBUG advmgr sector-storage/sched.go:449 SCHED windows: [{allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]}] 2021-01-25T04:53:03.068Z DEBUG advmgr sector-storage/sched.go:450 SCHED Acceptable win: [[0 1]] 2021-01-25T04:53:03.068Z DEBUG advmgr sector-storage/sched.go:465 SCHED try assign sqi:0 sector 330 to window 0 2021-01-25T04:53:03.069Z DEBUG advmgr sector-storage/sched.go:472 SCHED ASSIGNED sqi:0 sector 330 task seal/v0/unsealread to window 0 2021-01-25T04:53:03.069Z DEBUG advmgr sector-storage/sched_worker.go:368 assign worker sector 330 2021-01-25T04:53:03.069Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows 2021-01-25T04:53:03.081Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 } 2021-01-25T04:53:03.081Z DEBUG advmgr sector-storage/sched_worker.go:278 task done {"workerid": "65d50ee7-57fb-40bc-86c4-ec86ab936cbd"} 2021-01-25T04:53:03.081Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows 2021-01-25T04:53:03.093Z INFO graphsync responsemanager/responsemanager.go:477 response failed: %!w(traversal.SkipMe={}) 2021-01-25T04:53:03.105Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 } 2021-01-25T04:53:03.140Z DEBUG advmgr sector-storage/sched.go:354 SCHED 1 queued; 4 open windows 2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched.go:449 SCHED windows: [{allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]}] 2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched.go:450 SCHED Acceptable win: [[0 3]] 2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched.go:465 SCHED try assign sqi:0 sector 330 to window 0 2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched.go:472 SCHED ASSIGNED sqi:0 sector 330 task seal/v0/unseal to window 0 2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched_worker.go:368 assign worker sector 330 2021-01-25T04:53:03.147Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows 2021-01-25T04:53:03.156Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:6; a:0): {{0 0} /pool/storage/sealed/s-t01278-330 /pool/storage/cache/s-t01278-330} 2021-01-25T04:53:03.174Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 } 2021-01-25T04:53:03.175Z DEBUG advmgr sector-storage/sched_worker.go:278 task done {"workerid": "65d50ee7-57fb-40bc-86c4-ec86ab936cbd"} 2021-01-25T04:53:03.175Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows 2021-01-25T04:53:03.191Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 } 2021-01-25T04:53:03.191Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:6; a:0): {{0 0} /pool/storage/sealed/s-t01278-330 /pool/storage/cache/s-t01278-330} 2021-01-25T04:53:03.199 INFO filcrypto::proofs::api > unseal_range: start 2021-01-25T04:53:03.202 INFO filecoin_proofs::api > unseal_range:start 2021-01-25T04:53:06.493Z INFO miner miner/miner.go:384 Time delta between now and our mining base: 6s (nulls: 0) 2021-01-25T04:53:30.046Z INFO graphsync impl/graphsync.go:299 Graphsync ReceiveError: stream reset 2021-01-25T04:53:37.258Z INFO miner miner/miner.go:384 Time delta between now and our mining base: 7s (nulls: 0) 2021-01-25T04:54:06.311Z INFO miner miner/miner.go:384 Time delta between now and our mining base: 6s (nulls: 0) 2021-01-25T04:54:23.446 INFO storage_proofs_porep::stacked::vanilla::graph > using parent_cache[1073741824 / 1073741824] 2021-01-25T04:54:23.449 INFO storage_proofs_porep::stacked::vanilla::cache > parent cache: opening /home/stuart/parms/v28-sdr-parent-21981246c370f9d76c7a77ab273d94bde0ceb4e938292334960bce05585dc117.cache, verify enabled: false 2021-01-25T04:54:23.449 INFO storage_proofs_porep::stacked::vanilla::proof > multi core replication 2021-01-25T04:54:23.450 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > create labels 2021-01-25T04:54:23.501 INFO storage_proofs_porep::stacked::vanilla::memory_handling > initializing cache 2021-01-25T04:54:23.501 WARN storage_proofs_porep::stacked::vanilla::memory_handling > failed to lock map Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }, falling back 2021-01-25T04:54:23.503 WARN storage_proofs_porep::stacked::vanilla::memory_handling > failed to lock map Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }, falling back 2021-01-25T04:54:23.503 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > Layer 1 2021-01-25T04:54:23.503 INFO storage_proofs_porep::stacked::vanilla::create_label::multi > Creating labels for layer 1 ... (more porep logs) 2021-01-25T08:18:27.278 INFO filecoin_proofs::api > unseal_range:finish 2021-01-25T08:18:28.970 INFO filcrypto::proofs::api > unseal_range: finish 2021-01-25T08:18:28.997Z DEBUG advmgr sector-storage/sched.go:354 SCHED 1 queued; 4 open windows 2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched.go:449 SCHED windows: [{allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]}] 2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched.go:450 SCHED Acceptable win: [[0 1]] 2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched.go:465 SCHED try assign sqi:0 sector 330 to window 0 2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched.go:472 SCHED ASSIGNED sqi:0 sector 330 task seal/v0/unsealread to window 0 2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched_worker.go:368 assign worker sector 330 2021-01-25T08:18:29.000Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows 2021-01-25T08:18:29.025Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 } 2021-01-25T08:18:29.025Z DEBUG advmgr sector-storage/sched_worker.go:278 task done {"workerid": "65d50ee7-57fb-40bc-86c4-ec86ab936cbd"} 2021-01-25T08:18:29.025Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows 2021-01-25T08:18:29.042Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 330} 8} (e:1; a:0): {{0 0} /pool/storage/unsealed/s-t01278-330 } 2021-01-25T08:18:36.008Z INFO miner miner/miner.go:384 Time delta between now and our mining base: 6s (nulls: 0) 2021-01-25T08:18:37.358Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventDataTransferError", "deal ID": "8", "receiver": "12D3KooWEdkgB5RQCAZToi4d3U9GSYwjLR38tqaNVtrwy5KGH81q", "state": "DealStatusErrored", "message": "deal data transfer failed: incomplete response"} 2021-01-25T08:18:46.451Z WARN storagemarket_network network/deal_status_stream.go:26 stream reset 2021-01-25T08:18:46.451Z ERROR storagemarket_impl impl/provider.go:464 failed to read DealStatusRequest from incoming stream: stream reset

Logs for sector 440 from miner

2021-01-25T05:48:20.221Z INFO dt-impl impl/events.go:338 received new channel request from 12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv 2021-01-25T05:48:20.355Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventPaymentRequested", "deal ID": "8", "receiver": "12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv", "state": "DealStatusFundsNeededUnseal", "message": ""} 2021-01-25T05:48:20.488Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventDealAccepted", "deal ID": "8", "receiver": "12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv", "state": "DealStatusFundsNeededUnseal", "message": ""} 2021-01-25T05:48:20.803Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventPaymentReceived", "deal ID": "8", "receiver": "12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv", "state": "DealStatusUnsealing", "message": ""} 2021-01-25T05:48:20.815Z DEBUG advmgr sector-storage/sched.go:354 SCHED 1 queued; 4 open windows 2021-01-25T05:48:20.824Z DEBUG advmgr sector-storage/sched.go:449 SCHED windows: [{allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]} {allocated:{memUsedMin:0 memUsedMax:0 gpuUsed:false cpuUse:0 cond:} todo:[]}] 2021-01-25T05:48:20.824Z DEBUG advmgr sector-storage/sched.go:450 SCHED Acceptable win: [[0 3]] 2021-01-25T05:48:20.824Z DEBUG advmgr sector-storage/sched.go:465 SCHED try assign sqi:0 sector 440 to window 0 2021-01-25T05:48:20.824Z DEBUG advmgr sector-storage/sched.go:472 SCHED ASSIGNED sqi:0 sector 440 task seal/v0/unseal to window 0 2021-01-25T05:48:20.824Z DEBUG advmgr sector-storage/sched_worker.go:368 assign worker sector 440 2021-01-25T05:48:20.825Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows 2021-01-25T05:48:20.833Z DEBUG advmgr sector-storage/sched_worker.go:278 task done {"workerid": "5036f473-8aed-478a-b65e-3b1f3281ed20"} 2021-01-25T05:48:20.833Z DEBUG advmgr sector-storage/sched.go:354 SCHED 0 queued; 4 open windows 2021-01-25T05:48:20.839Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventUnsealError", "deal ID": "8", "receiver": "12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv", "state": "DealStatusFailing", "message": "storage call error 0: unsealing sector: acquire unsealed sector path (existing): allocate local sector for fetching: couldn't find a suitable path for a sector"} 2021-01-25T05:48:20.888Z INFO graphsync responsemanager/responsemanager.go:477 response failed: %!w(traversal.SkipMe={}) 2021-01-25T05:48:20.956Z INFO dt-impl impl/impl.go:315 close channel 12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv-12D3KooWQvdhZSSArWTSBiyjVZa3mAmzoLHXV2QS9nFoaGLv3fL7-2229 2021-01-25T05:48:20.981Z WARN dt-impl impl/impl.go:323 unable to close channel 12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv-12D3KooWQvdhZSSArWTSBiyjVZa3mAmzoLHXV2QS9nFoaGLv3fL7-2229: channel not found 2021-01-25T05:48:20.981Z INFO dt-impl impl/impl.go:326 12D3KooWQvdhZSSArWTSBiyjVZa3mAmzoLHXV2QS9nFoaGLv3fL7: sending close channel to 12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv for channel 12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv-12D3KooWQvdhZSSArWTSBiyjVZa3mAmzoLHXV2QS9nFoaGLv3fL7-2229 2021-01-25T05:48:20.997Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventDataTransferError", "deal ID": "8", "receiver": "12D3KooWEvhqEU2TReremvL7Pd8xz2gEwsGMz9jtdj5d3Dhi1Dpv", "state": "DealStatusErrored", "message": "deal data transfer failed: incomplete response"}

Logs from lotus-worker for 440

2021-01-25T05:38:36.418Z INFO stores stores/remote.go:224 Fetch http://192.168.1.95:2345/remote/sealed/s-t01278-440 -> /seal/worker/sealed/fetching/s-t01278-440 2021-01-25T05:43:30.991Z DEBUG stores stores/util_unix.go:28 move sector data {"from": "/seal/worker/sealed/fetching/s-t01278-440", "to": "/seal/worker/sealed/s-t01278-440"} 2021-01-25T05:43:30.994Z INFO stores stores/remote.go:224 Fetch http://192.168.1.95:2345/remote/cache/s-t01278-440 -> /seal/worker/cache/fetching/s-t01278-440 2021-01-25T05:43:31.905Z DEBUG stores stores/util_unix.go:28 move sector data {"from": "/seal/worker/cache/fetching/s-t01278-440", "to": "/seal/worker/cache/s-t01278-440"} 2021-01-25T05:43:31.908Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 440} 8} (e:6; a:0): {{0 0} /seal/worker/sealed/s-t01278-440 /seal/worker/cache/s-t01278-440} 2021-01-25T05:48:20.831Z DEBUG advmgr sector-storage/worker_local.go:129 acquired sector {{1278 440} 8} (e:6; a:0): {{0 0} /seal/worker/sealed/s-t01278-440 /seal/worker/cache/s-t01278-440}

kernelogic commented 3 years ago

That align with my attempt, @stuberman . All end up with deal data transfer failed: incomplete response.

s0nik42 commented 3 years ago

have you tried on fast-retrieval=true deals ?

s0nik42 commented 3 years ago

fast-retrieval=true just send you a local copy of the file that doesn't required any computation

s0nik42 commented 3 years ago

I'm going to try now. Let you know.

kernelogic commented 3 years ago

No I haven't tried. I don't want to waste too much space of my providers so all my slingshot deals were sent as fast-retrieval=false. Now this decision might come back bite me as slingshot judges might deduct rewards for failed retrievals.

s0nik42 commented 3 years ago

seems like miners are now blocking retrieval deals to protect their miners ...

kaptinlin commented 3 years ago

We have the same unmarshalling error issue.

ERROR: RPC client error: unmarshaling result: failed to parse big string: '"\u003cnil\u003e"'
dgreuel2002 commented 3 years ago

I am still getting this when I do "lotus-miner retrieval-deals list" this happened after @kernelogic sent me this deal in this post and it crashed miner....has there been any movement on this or should I open a separate issue? @s0nik42

3.txt

dirkmc commented 3 years ago

@dgreuel2002 the fix is in https://github.com/filecoin-project/lotus/pull/5493, however this has not been released yet. It should go out in the next lotus release.

s0nik42 commented 3 years ago

@dirkmc are you talking about

dgreuel2002 commented 3 years ago

@s0nik42 I have updated to the new version and am still getting the lotus-miner retrieval-deals list output garbage

s0nik42 commented 3 years ago

yes, I'm not sure these 2 issues are related

dirkmc commented 3 years ago

Sorry I didn't look at the log, I just read your message.

https://github.com/filecoin-project/lotus/pull/5493 should fix the crash when deal price per byte is zero

jennijuju commented 3 years ago

We believe the unsealing issue is in the proof instead of lotus, track here https://github.com/filecoin-project/rust-fil-proofs/issues/1402.

neondragon commented 3 years ago

I am seeing 'Stuck after DealStatusOngoing' on my miner, when trying to retrieve a file that is already unsealed. The miner tries to unseal it again.

lotus-miner version 1.5.2+mainnet+git.cd293aa72.dirty

Client:

~# lotus client retrieve --miner f019551 bafykbzaceb3lrwftnl5vn7msktfal44jqexs2yz6z2kkz4xvnh7cnf5towzdk test.file Recv: 0 B, Paid 0 FIL, ClientEventOpen (DealStatusNew) Recv: 0 B, Paid 0 FIL, ClientEventDealProposed (DealStatusWaitForAcceptance) Recv: 0 B, Paid 0 FIL, ClientEventDealAccepted (DealStatusAccepted) Recv: 0 B, Paid 0 FIL, ClientEventPaymentChannelAddingFunds (DealStatusPaymentChannelAllocatingLane) Recv: 0 B, Paid 0 FIL, ClientEventLaneAllocated (DealStatusOngoing)

miner.log:

{"level":"info","ts":"2021-03-17T09:23:14.227Z","logger":"dt-impl","caller":"impl/events.go:338","msg":"received new channel request from 12D3KooWGNLxbsJXQYfaSnMRemgVaVYFCBr9w9NTkYaKmLipY6g5"} {"level":"info","ts":"2021-03-17T09:23:14.236Z","logger":"markets","caller":"loggers/loggers.go:30","msg":"retrieval provider event","name":"ProviderEventOpen","deal ID":"1","receiver":"12D3KooWGNLxbsJXQYfaSnMRemgVaVYFCBr9w9NTkYaKmLipY6g5","state":"DealStatusNew","message":""} {"level":"info","ts":"2021-03-17T09:23:14.237Z","logger":"markets","caller":"loggers/loggers.go:30","msg":"retrieval provider event","name":"ProviderEventDealAccepted","deal ID":"1","receiver":"12D3KooWGNLxbsJXQYfaSnMRemgVaVYFCBr9w9NTkYaKmLipY6g5","state":"DealStatusUnsealing","message":""}

~# lotus-miner sealing jobs

ID Sector Worker Hostname Task State Time 4fe081b0 1369 1224d871 gb-wlv1-filecoin2 RD running 19m59.9s

~# lotus-miner storage find 1369

In 2aaec1c2-9ad6-4563-8f59-0ac57179649e (Unsealed, Sealed, Cache) Sealing: false; Storage: true ...

UPDATE: After a while, the lotus client began to download the data. The cause of the delay seems to be lotus-miner taking a long time copying the unsealed sector from the storage location into .lotusminer/datastore/staging/ before starting to send it to the client. The deal status transitioned through "DealStatusUnsealing" which was misleading -- no actual unseal task was scheduled or run. The transfer is in progress (13GB so far), and is certainly a lot slower than a file transfer could be on the local 10GbE network, but it is working.

UPDATE2: The 32GiB download completed, it just took quite a long time. It also seemed to start faster and then get progressively slower the closer it got to 32GiB.

flyworker commented 3 years ago

with lotus 1.5.0

Mar 17 15:42:11 miner7 lotus-miner[1701810]: 2021-03-17T15:42:11.349Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventUnsealComplete", "deal ID": "12", "receiver": "12D3KooWLb3vya3V3ZU39jZRfVsTEY6KB93czht8AdVAF3pMCgqA", "state": "DealStatusUnsealed", "message": ""} Mar 17 15:42:11 miner7 lotus-miner[1701810]: 2021-03-17T15:42:11.376Z INFO dt-impl impl/impl.go:370 resume channel 12D3KooWLb3vya3V3ZU39jZRfVsTEY6KB93czht8AdVAF3pMCgqA-12D3KooWHhJjBsoAom7ong97Q4FJeNP2TbCdKCsYvnFB7DHMn5RM-12 Mar 17 15:42:11 miner7 lotus-miner[1701810]: 2021-03-17T15:42:11.376Z INFO graphsync responsemanager/responsemanager.go:477 response failed: %!w(traversal.SkipMe={}) Mar 17 15:42:11 miner7 lotus-miner[1701810]: 2021-03-17T15:42:11.442Z INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventDataTransferError", "deal ID": "12", "receiver": "12D3KooWLb3vya3V3ZU39jZRfVsTEY6KB93czht8AdVAF3pMCgqA", "state": "DealStatusErrored", "message": "deal data transfer failed: incomplete response"}

dkkapur commented 3 years ago

@kernelogic - any updates on this based on changes introduced in 1.6 and 1.7? Is this still an ongoing problem?

dkkapur commented 3 years ago

Going to close this out for the time being. Please open a new Issue for specific issues you may still be experiencing.