filecoin-project / oni

👹 (DEPRECATED; see README) Project Oni | Network Validation
https://docs.google.com/document/d/16jYL--EWYpJhxT9bakYq7ZBGLQ9SB940Wd1lTDOAbNE
7 stars 5 forks source link

Offline deal flow / dumbo drop #173

Open yusefnapora opened 4 years ago

yusefnapora commented 4 years ago

This adds a deals-offline test case that tries to figure out the client-side scaling limits for proposing and tracking deals (closes #160).

The basic flow is:

The miner side of things is currently pretty broken; lots of gas limit errors like this:

Jul 28 17:07:19.458837  INFO    61.9329s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.457Z WARN    vm      vm/runtime.go:144       VM.Call failure: not enough gas: used=14966134, available=14966134 (RetCode=7): {"req_id": "04bdfdc3"}
Jul 28 17:07:19.458928  INFO    61.9331s      ERROR << miners[000] (0d9cc7) >>     github.com/filecoin-project/lotus/chain/vm.(*Runtime).chargeGasInternal      {"req_id": "04bdfdc3"}
Jul 28 17:07:19.458961  INFO    61.9331s      ERROR << miners[000] (0d9cc7) >>         /go/pkg/mod/github.com/filecoin-project/lotus@v0.4.3-0.20200727232759-291d2fe2ded7/chain/vm/runtime.go:555       {"req_id": "04bdfdc3"}
Jul 28 17:07:19.458976  INFO    61.9331s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.457Z WARN    vm      vm/runtime.go:367       vmctx send failed: to: t04, method: 4: ret: [], err: not enough gas: used=14966134, available=14966134 (RetCode=7)      {"req_id": "04bdfdc3"}
Jul 28 17:07:19.458990  INFO    61.9331s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.457Z WARN    vm      vm/runtime.go:315       Abortf: failed to enroll cron event     {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459038  INFO    61.9332s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.457Z WARN    vm      vm/runtime.go:144       VM.Call failure: failed to enroll cron event (RetCode=7):       {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459085  INFO    61.9332s      ERROR << miners[000] (0d9cc7) >>     github.com/filecoin-project/specs-actors/actors/builtin.RequireSuccess       {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459103  INFO    61.9332s      ERROR << miners[000] (0d9cc7) >>         /go/pkg/mod/github.com/filecoin-project/specs-actors@v0.8.1-0.20200724015154-3c690d9b7e1d/actors/builtin/shared.go:24    {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459124  INFO    61.9332s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.457Z WARN    vm      vm/vm.go:419    Send actor error        {"from": "t3sdzfoz2rkohcoixjaxgnxmqeupyrtqb44f6bb27zkowwhncn4biawzkpasppy3pl52opzritaamjn2ol5aca", "to": "t01000", "nonce": 80, "method": "6", "height": "29", "error": "failed to enroll cron event (RetCode=7):\n    github.com/filecoin-project/specs-actors/actors/builtin.RequireSuccess\n        /go/pkg/mod/github.com/filecoin-project/specs-actors@v0.8.1-0.20200724015154-3c690d9b7e1d/actors/builtin/shared.go:24"}      {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459313  INFO    61.9334s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.458Z WARN    vm      vm/runtime.go:144       VM.Call failure: not enough gas: used=14966134, available=14966134 (RetCode=7): {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459382  INFO    61.9335s      ERROR << miners[000] (0d9cc7) >>     github.com/filecoin-project/lotus/chain/vm.(*Runtime).chargeGasInternal      {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459441  INFO    61.9336s      ERROR << miners[000] (0d9cc7) >>         /go/pkg/mod/github.com/filecoin-project/lotus@v0.4.3-0.20200727232759-291d2fe2ded7/chain/vm/runtime.go:555       {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459505  INFO    61.9336s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.458Z WARN    vm      vm/runtime.go:367       vmctx send failed: to: t04, method: 4: ret: [], err: not enough gas: used=14966134, available=14966134 (RetCode=7)      {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459540  INFO    61.9337s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.458Z WARN    vm      vm/runtime.go:315       Abortf: failed to enroll cron event     {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459573  INFO    61.9337s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.458Z WARN    vm      vm/runtime.go:144       VM.Call failure: failed to enroll cron event (RetCode=7):       {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459592  INFO    61.9337s      ERROR << miners[000] (0d9cc7) >>     github.com/filecoin-project/specs-actors/actors/builtin.RequireSuccess       {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459656  INFO    61.9338s      ERROR << miners[000] (0d9cc7) >>         /go/pkg/mod/github.com/filecoin-project/specs-actors@v0.8.1-0.20200724015154-3c690d9b7e1d/actors/builtin/shared.go:24    {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459680  INFO    61.9338s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.458Z WARN    vm      vm/vm.go:419    Send actor error        {"from": "t3sdzfoz2rkohcoixjaxgnxmqeupyrtqb44f6bb27zkowwhncn4biawzkpasppy3pl52opzritaamjn2ol5aca", "to": "t01000", "nonce": 81, "method": "6", "height": "29", "error": "failed to enroll cron event (RetCode=7):\n    github.com/filecoin-project/specs-actors/actors/builtin.RequireSuccess\n        /go/pkg/mod/github.com/filecoin-project/specs-actors@v0.8.1-0.20200724015154-3c690d9b7e1d/actors/builtin/shared.go:24"}      {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459737  INFO    61.9339s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.459Z WARN    vm      vm/runtime.go:144       VM.Call failure: not enough gas: used=14968534, available=14968534 (RetCode=7): {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459797  INFO    61.9339s      ERROR << miners[000] (0d9cc7) >>     github.com/filecoin-project/lotus/chain/vm.(*Runtime).chargeGasInternal      {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459832  INFO    61.9340s      ERROR << miners[000] (0d9cc7) >>         /go/pkg/mod/github.com/filecoin-project/lotus@v0.4.3-0.20200727232759-291d2fe2ded7/chain/vm/runtime.go:555       {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459877  INFO    61.9340s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.459Z WARN    vm      vm/runtime.go:367       vmctx send failed: to: t04, method: 4: ret: [], err: not enough gas: used=14968534, available=14968534 (RetCode=7)      {"req_id": "04bdfdc3"}

There are also some errors about missing deals when sealing kicks in:

Jul 28 19:30:33.063336  INFO    129.8215s    MESSAGE << clients[000] (7113e3) >> deal states:
StorageDealSealing  120
        {"req_id": "9deca62e"}
Jul 28 19:30:34.018913  INFO    130.7775s      ERROR << miners[000] (f9f87c) >> 2020-07-28T19:30:34.018Z        WARN    chain   chain/sync.go:647       [INSECURE-POST-VALIDATION] if you see this outside of a test, it is a severe bug!       {"req_id": "9deca62e"}
Jul 28 19:30:34.726583  INFO    131.4851s      ERROR << miners[000] (f9f87c) >> 2020-07-28T19:30:34.725Z        WARN    vm      vm/runtime.go:315       Abortf: failed to validate dealProposals for activation: dealId 117 not found   {"req_id": "9deca62e"}
Jul 28 19:30:34.726657  INFO    131.4852s      ERROR << miners[000] (f9f87c) >> 2020-07-28T19:30:34.726Z        WARN    vm      vm/runtime.go:144       VM.Call failure: failed to validate dealProposals for activation: dealId 117 not found (RetCode=20):    {"req_id": "9deca62e"}
Jul 28 19:30:34.726677  INFO    131.4853s      ERROR << miners[000] (f9f87c) >>     github.com/filecoin-project/specs-actors/actors/builtin.RequireNoErr        {"req_id": "9deca62e"}
Jul 28 19:30:34.726691  INFO    131.4853s      ERROR << miners[000] (f9f87c) >>         /go/pkg/mod/github.com/filecoin-project/specs-actors@v0.8.1-0.20200724015154-3c690d9b7e1d/actors/builtin/shared.go:34   {"req_id": "9deca62e"}
Jul 28 19:30:34.726705  INFO    131.4853s      ERROR << miners[000] (f9f87c) >> 2020-07-28T19:30:34.726Z        WARN    vm      vm/runtime.go:367       vmctx send failed: to: t05, method: 5: ret: [], err: failed to validate dealProposals for activation: dealId 117 not found (RetCode=20) {"req_id": "9deca62e"}
Jul 28 19:30:34.726723  INFO    131.4853s      ERROR << miners[000] (f9f87c) >> 2020-07-28T19:30:34.726Z        WARN    vm      vm/runtime.go:315       Abortf: failed to verify deals and get deal weight      {"req_id": "9deca62e"}
Jul 28 19:30:34.726738  INFO    131.4853s      ERROR << miners[000] (f9f87c) >> 2020-07-28T19:30:34.726Z        WARN    vm      vm/runtime.go:144       VM.Call failure: failed to verify deals and get deal weight (RetCode=20):       {"req_id": "9deca62e"}
Jul 28 19:30:34.726751  INFO    131.4853s      ERROR << miners[000] (f9f87c) >>     github.com/filecoin-project/specs-actors/actors/builtin.RequireSuccess      {"req_id": "9deca62e"}
Jul 28 19:30:34.726764  INFO    131.4853s      ERROR << miners[000] (f9f87c) >>         /go/pkg/mod/github.com/filecoin-project/specs-actors@v0.8.1-0.20200724015154-3c690d9b7e1d/actors/builtin/shared.go:24   {"req_id": "9deca62e"}
Jul 28 19:30:34.726778  INFO    131.4854s      ERROR << miners[000] (f9f87c) >> 2020-07-28T19:30:34.726Z        WARN    vm      vm/vm.go:419    Send actor error        {"from": "t3ri3mdrcma3abddkzaio24y5biubuvuze26och5tvw7ogpuzc2iakck2wx6x4j2efmwsy7lqlr43ckjrvb5na", "to": "t01000", "nonce": 403, "method": "6", "height": "64", "error": "failed to verify deals and get deal weight (RetCode=20):\n    github.com/filecoin-project/specs-actors/actors/builtin.RequireSuccess\n        /go/pkg/mod/github.com/filecoin-project/specs-actors@v0.8.1-0.20200724015154-3c690d9b7e1d/actors/builtin/shared.go:24"} {"req_id": "9deca62e"}
Jul 28 19:30:34.726793  INFO    131.4854s      ERROR << miners[000] (f9f87c) >> 2020-07-28T19:30:34.726Z        WARN    vm      vm/runtime.go:315       Abortf: failed to validate dealProposals for activation: dealId 95 not found    {"req_id": "9deca62e"}
Jul 28 19:30:34.726808  INFO    131.4854s      ERROR << miners[000] (f9f87c) >> 2020-07-28T19:30:34.726Z        WARN    vm      vm/runtime.go:144       VM.Call failure: failed to validate dealProposals for activation: dealId 95 not found (RetCode=20):     {"req_id": "9deca62e"}

However the errors don't seem to immediately fail any deals and apparently the gas issue is affecting real deployments as well.

On the client side, all the proposed deals are making into the chain and are returned by ClientListDeals, so I think that this test is still valid for finding the client limits even if the miner is a bit borked.

Results so far, running on a k8s cluster with c5.2xlarge workers (8 CPUs, 16GB ram) and a testground daemon hacked to have longer timeouts:

cc @ribasushi @jnthnvctr

btw, @nonsense we may not want to merge this to master yet; I commented out the change to the pre-commit delay in init.go to be closer to the production scenario, so it may mess with the window post tests. I'm not sure it actually makes a difference for this test though, so we can probably put it back.

yusefnapora commented 4 years ago

Unfortunately, we lose the stdout / stderr from tests running on k8s, so the lotus internal logs are missing for these runs.

I may try running again today with larger worker VMs and the keep_service flag so I can grab the logs from the kubernetes pods after the run.