filecoin-project / oni

👹 (DEPRECATED; see README) Project Oni | Network Validation
https://docs.google.com/document/d/16jYL--EWYpJhxT9bakYq7ZBGLQ9SB940Wd1lTDOAbNE
7 stars 5 forks source link

[test request] deals stress test #18

Open raulk opened 4 years ago

raulk commented 4 years ago

What would you like us to test?

Stress testing for % of deals that go through in adverse conditions (e.g. nodes suddenly going offline, etc).

Technical implementation details.

Also generate a baseline that captures how the system behaves normal/ideal conditions.

What should we measure?

  1. Test several deals one client to multiple miners. (1 client => N miners)
  2. Test several deals many clients to one miner. (N clients => 1 miner).
  3. "Make a deal, wipe the provider blockstore, verify unsealing works — I’m not sure this has truly been tested."
    • @raulk: this is more of a data lifecycle test. In other terms: clear the miner cache so that when queried for the data, it has to fall back to unsealing the sector (recovering it from archive/cold storage).

On a scale from 0-10, what's the proposed _discomfort factor_? In other words, how uncomfortable would you be if we went live without having tested this? Explain why.

TBD.

Additional remarks.

Requestor: @hannahhoward.

raulk commented 4 years ago

@nonsense @vyzo I edited the description to add the details provided by Hannah.

vyzo commented 4 years ago

A couple of observations from running with 300 deals, both serial and concurrent with 2 miners and 3 clients. In the serial case, after an overnight run, one client succeeded, but the other two got terminally stuck in a StorageDealSealing state. In the concurrent case, almost all deals get stuck in the StorageDealSealing state.

raulk commented 4 years ago

@vyzo thanks for the input! We need to make this actionable so that we can investigate further. It may as well be an issue on our end. Possibly related to the fact that we're in catch-up mining mode and miners may be building separate chains. Could you please upload the logs from both runs?

vyzo commented 4 years ago

So digging further in the concurrent stress test logs, the miners just stopped at block 155; no errors.

raulk commented 4 years ago

We have reported the issues we found upstream: https://github.com/filecoin-project/lotus/issues/2294.

raulk commented 4 years ago

https://github.com/filecoin-project/lotus/issues/2293 https://github.com/filecoin-project/lotus/issues/2292 https://github.com/filecoin-project/lotus/issues/2291 https://github.com/filecoin-project/lotus/issues/2250 https://github.com/filecoin-project/lotus/issues/2249 https://github.com/filecoin-project/lotus/issues/2294