filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.83k stars 1.25k forks source link

Provide backoff from storage-sealing fsm to markets with respect to MaxSealingSectorsForDeals config value #3726

Closed lanzafame closed 4 years ago

lanzafame commented 4 years ago

Describe the bug On restart of the miner the markets actor will take any deals that were in staging and add them without taking the MaxSealingSectorsForDeals value into consideration.

Having some way for the markets actor to backoff would prevent this.

@aarshkshah1992 @arajasek

Source of bug https://filecoinproject.slack.com/archives/C0179RNEMU4/p1599718261135300

aarshkshah1992 commented 4 years ago

cc @hannahhoward .

aarshkshah1992 commented 4 years ago

Logs at:

https://gist.github.com/karalabe/8e3b0de5d65f2254389b873398e1b699.

karalabe commented 4 years ago

Here are space race failures: https://spacerace.filecoin.io/deals/t019100

Here are the sector lists:

0: Proving     sSet: YES active: YES tktH: 41911 seedH: 43787 deals: [0]       toUpgrade:false
1: Proving     sSet: YES active: YES tktH: 42005 seedH: 43952 deals: [0]       toUpgrade:false
2: Proving     sSet: YES active: YES tktH: 43380 seedH: 45396 deals: [0]       toUpgrade:false
3: Proving     sSet: YES active: YES tktH: 43380 seedH: 45591 deals: [0]       toUpgrade:false
6: Committing  sSet: NO  active: NO  tktH: 44128 seedH: 46750 deals: [0]       toUpgrade:false
7: Committing  sSet: NO  active: NO  tktH: 44128 seedH: 46750 deals: [0]       toUpgrade:false
8: Removed     sSet: NO  active: NO  tktH: 0     seedH: 0     deals: [100342]  toUpgrade:false
9: Committing  sSet: NO  active: NO  tktH: 44180 seedH: 46765 deals: [0]       toUpgrade:false
10: PreCommit2 sSet: NO  active: NO  tktH: 44180 seedH: 0     deals: [0]       toUpgrade:false
11: PreCommit2 sSet: NO  active: NO  tktH: 44182 seedH: 0     deals: [0]       toUpgrade:false
12: PreCommit2 sSet: NO  active: NO  tktH: 44182 seedH: 0     deals: [0]       toUpgrade:false
13: PreCommit2 sSet: NO  active: NO  tktH: 44331 seedH: 0     deals: [0]       toUpgrade:false
14: WaitSeed   sSet: NO  active: NO  tktH: 45228 seedH: 0     deals: [0]       toUpgrade:false
15: PreCommit1 sSet: NO  active: NO  tktH: 0     seedH: 0     deals: [0]       toUpgrade:false

Here are the deal lists:

ProposalCid  DealId  State               Client                                                                                  Size    Price                     Duration
...cchdm5jm  100342  StorageDealSealing  t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128MiB  0.0000439749375 FIL       703599
...iqmecizq  101169  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128MiB  0.000043952 FIL           703232
...ehof2riq  101433  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128MiB  0.0000439440625 FIL       703105
...krcgiyae  101817  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128KiB  0.000000042905957055 FIL  702973
...27kpqi3q  102115  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128MiB  0.0000439279375 FIL       702847
...g32gmaou  102498  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128MiB  0.000043920625 FIL        702730
...dxza5ez4  102618  StorageDealError    t3rey7pw2asdskrqpretcbnxjczkyr4lyindagouytbxtigbhihdwz4sxfwhf5m4cpofdd23huxraq4tmwmera  128KiB  0.00000115773 FIL         578865
...7qfpqryq  102992  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128MiB  0.00004391275 FIL         702604
...psy3suem  103536  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128MiB  0.00004408475 FIL         705356
...wbnkfrfi  0       StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128KiB  0.00000004304139372 FIL   705192
...cttovq7m  104159  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128MiB  0.0000440655 FIL          705048
...mwd66zbe  104514  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128KiB  0.000000043024120815 FIL  704909
...tbhkicdy  104813  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128KiB  0.000000043017468 FIL     704800
...vuwm5h4y  104923  StorageDealError    t3v2zpeguwiruv337whycttt362nn5yrzzurtb62dad7sqxdsgzlaj5mpgmw6rh2arlsufyvlgssduaqusmkya  8MiB    0.000075491875 FIL        603935
...zjjnh3ka  105079  StorageDealError    t3v2zpeguwiruv337whycttt362nn5yrzzurtb62dad7sqxdsgzlaj5mpgmw6rh2arlsufyvlgssduaqusmkya  8MiB    0.000075489125 FIL        603913
...irotkebm  105098  StorageDealError    t3v2zpeguwiruv337whycttt362nn5yrzzurtb62dad7sqxdsgzlaj5mpgmw6rh2arlsufyvlgssduaqusmkya  8MiB    0.000075488 FIL           603904
...odzwhexa  105189  StorageDealError    t3v2zpeguwiruv337whycttt362nn5yrzzurtb62dad7sqxdsgzlaj5mpgmw6rh2arlsufyvlgssduaqusmkya  8MiB    0.0000754865 FIL          603892
...rf7kozvm  105245  StorageDealError    t3rey7pw2asdskrqpretcbnxjczkyr4lyindagouytbxtigbhihdwz4sxfwhf5m4cpofdd23huxraq4tmwmera  8MiB    0.00007548625 FIL         603890
...vkvqjfvu  105246  StorageDealError    t3v2zpeguwiruv337whycttt362nn5yrzzurtb62dad7sqxdsgzlaj5mpgmw6rh2arlsufyvlgssduaqusmkya  8MiB    0.00007548625 FIL         603890
...6wme34xe  105415  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128KiB  0.00000004301075415 FIL   704690
...bjgx5ghm  105416  StorageDealError    t3rey7pw2asdskrqpretcbnxjczkyr4lyindagouytbxtigbhihdwz4sxfwhf5m4cpofdd23huxraq4tmwmera  8MiB    0.00007548625 FIL         603890
...dev76rui  105417  StorageDealError    t3rey7pw2asdskrqpretcbnxjczkyr4lyindagouytbxtigbhihdwz4sxfwhf5m4cpofdd23huxraq4tmwmera  8MiB    0.00007548625 FIL         603890
...os34kr5u  105474  StorageDealError    t3rey7pw2asdskrqpretcbnxjczkyr4lyindagouytbxtigbhihdwz4sxfwhf5m4cpofdd23huxraq4tmwmera  8MiB    0.00007548175 FIL         603854
...hvqqniji  105766  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128MiB  0.0000440375 FIL          704600
...zfoehjja  106394  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128KiB  0.00000004299732645 FIL   704470
...rzkg4u5i  106768  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128MiB  0.000044021 FIL           704336
...wbbodx4u  107154  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128KiB  0.00000004298182356 FIL   704216
...a4spg3lq  108017  StorageDealError    t1nslxql4pck5pq7hddlzym3orxlx35wkepzjkm3i                                               128KiB  0.00000004296339099 FIL   703914

Config:

[Sealing]
  MaxWaitDealsSectors = 2
  MaxSealingSectors = 0
  MaxSealingSectorsForDeals = 2
  WaitDealsDelay = "6h0m0s"

Recent logs:

2020-09-10T11:44:14.720+0300    INFO    storagemarket_impl  impl/provider.go:199    Handling storage deal proposal!
2020-09-10T11:44:14.866+0300    INFO    markets loggers/loggers.go:19   storage event   {"name": "ProviderEventOpen", "proposal CID": "bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq", "state": "StorageDealValidating", "message": ""}
2020-09-10T11:44:14.870+0300    INFO    markets loggers/loggers.go:19   storage event   {"name": "ProviderEventDealDeciding", "proposal CID": "bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq", "state": "StorageDealAcceptWait", "message": ""}
2020-09-10T11:44:14.875+0300    INFO    markets loggers/loggers.go:19   storage event   {"name": "ProviderEventDataRequested", "proposal CID": "bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq", "state": "StorageDealWaitingForData", "message": ""}
2020-09-10T11:44:15.025+0300    INFO    markets loggers/loggers.go:19   storage event   {"name": "ProviderEventDataTransferInitiated", "proposal CID": "bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq", "state": "StorageDealTransferring", "message": ""}
2020-09-10T11:44:15.320+0300    INFO    markets loggers/loggers.go:19   storage event   {"name": "ProviderEventDataTransferCompleted", "proposal CID": "bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq", "state": "StorageDealVerifyData", "message": ""}
2020-09-10T11:44:15.331+0300    INFO    markets loggers/loggers.go:19   storage event   {"name": "ProviderEventVerifiedData", "proposal CID": "bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq", "state": "StorageDealEnsureProviderFunds", "message": ""}
2020-09-10T11:44:15.334+0300    INFO    markets loggers/loggers.go:19   storage event   {"name": "ProviderEventFundsReserved", "proposal CID": "bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq", "state": "StorageDealEnsureProviderFunds", "message": ""}
2020-09-10T11:44:15.335+0300    INFO    markets loggers/loggers.go:19   storage event   {"name": "ProviderEventFunded", "proposal CID": "bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq", "state": "StorageDealPublish", "message": ""}
2020-09-10T11:44:15.336+0300    INFO    storageadapter  storageadapter/provider.go:60   publishing deal
2020-09-10T11:44:15.415+0300    INFO    markets loggers/loggers.go:19   storage event   {"name": "ProviderEventDealPublishInitiated", "proposal CID": "bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq", "state": "StorageDealPublishing", "message": ""}
2020-09-10T11:45:15.893+0300    INFO    graphsync   impl/graphsync.go:244   Graphsync ReceiveError: stream reset
2020-09-10T11:46:19.601+0300    ERROR   storagemarket_impl  impl/provider.go:492    failed to write deal status response: stream reset
2020-09-10T11:47:54.535+0300    WARN    net/identify    go-log@v1.0.4/log.go:175    error reading identify message: stream reset
2020-09-10T11:48:54.533+0300    WARN    net/identify    go-log@v1.0.4/log.go:175    error reading identify message: stream reset
2020-09-10T11:49:50.831+0300    WARN    net/identify    go-log@v1.0.4/log.go:175    error reading identify message: stream reset
2020-09-10T11:50:00.773+0300    INFO    markets loggers/loggers.go:19   storage event   {"name": "ProviderEventDealPublished", "proposal CID": "bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq", "state": "StorageDealStaged", "message": ""}
2020-09-10T11:50:00.788+0300    INFO    sectors storage-sealing/sealing.go:144  Adding piece for deal 108017 (publish msg: bafy2bzacecmf6vdcr2zt73oqzkgrppdfkqusoqbvn26pp7wgkwygk2s3jee3q)
2020-09-10T11:50:00.789+0300    INFO    markets loggers/loggers.go:19   storage event   {"name": "ProviderEventDealHandoffFailed", "proposal CID": "bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq", "state": "StorageDealFailing", "message": "handing off deal to node: AddPiece failed: getting available sector: too many sectors sealing"}
2020-09-10T11:50:00.797+0300    WARN    providerstates  providerstates/provider_states.go:443   deal bafyreiczfbpzivuje6gadsvpicwviwgmtur33zvtirql7yd7qia4spg3lq failed: handing off deal to node: AddPiece failed: getting available sector: too many sectors sealing
hannahhoward commented 4 years ago

Solution implementation details per discussion with @magik6k :

Notes:

Proposed Solution:

  1. In Lotus ProviderNodeAdapter for storage market, in OnDealComplete if AddPiece returns error, inspect the error to see if it's the MaxSealingSectorsForDeals issue.
  2. If it is, just set a 5 minute timer, read from the timer channel, and then try again. (definitely stop if it's any other error or ctx cancells)
  3. Ok to do this synchronously -- no go routine.

cc: @aarshkshah1992