filecoin-project / oni

👹 (DEPRECATED; see README) Project Oni | Network Validation
https://docs.google.com/document/d/16jYL--EWYpJhxT9bakYq7ZBGLQ9SB940Wd1lTDOAbNE
7 stars 5 forks source link

[test scenario] dumbo drop client-side deal proposal scalability tests #160

Open raulk opened 4 years ago

raulk commented 4 years ago

Describe the test scenario.

We want to test what level of concurrency and volume a Lotus deal proposer (client only, no mining) is able to withstand. The results will help us determine the scalability of the dumbo drop client pool we need to operate to materialise these deals in the network.

More concretely, this is a stress test for testing deal proposal, management, monitoring from the client's perspective. For this particular test, testing the miner is not required. Ideally we'd be able to mock it, but that could prove very difficult.

Instead, we opt to take the following solution to isolate the miner's and chain's scalability from the test:

In order to simulate the end-to-end process with as a high degree of fidelity, we would use the offline deal circuit.

An idea is to make the locally miner produce 7,700 16-byte values randomly, store that data where the offline deal flow is supposed to find them, and advertise their CIDs (or CommP values, I guess?) via the sync service to the client. The client would then make offline deals for those CIDs.

Provide any background and technical implementation details.

See above; since we're looking to find the boundary of a single client by stressing it, just a 1-client, 1-miner setup should be sufficient.

What should we measure?

Discomfort factor (0-10).

10 (owners: @jnthnvctr and @ribasushi) .

Additional remarks.

ribasushi commented 4 years ago

make tiny deals, e.g. 16-byte deals

This is not possible, due to various cryptographic construction limits, the minimum piece is 127 bytes long: https://github.com/filecoin-project/rust-fil-proofs/issues/1231#issuecomment-663915253

The rest looks great!

raulk commented 4 years ago

@ribasushi thanks for the remark.

ribasushi commented 4 years ago

An idea is to make the locally miner produce 7,700 16-byte values randomly, store that data where the offline deal flow is supposed to find them, and advertise their CIDs (or CommP values, I guess?) via the sync service to the client.

You can use deterministic pseudorandom data generated by https://github.com/jbenet/go-random/blob/master/lib.go#L16 This way all you need to transfer is a single nonce plus the amount of increments of that nonce ( i.e. 2 values )

We use this code heavily in go-ipfs testing: https://github.com/ipfs/go-ipfs/blob/777d306f6e66e31a05f43a337bce272050407386/test/sharness/t0082-repo-gc-auto.sh#L20-L24

jnthnvctr commented 4 years ago

do we also need to test whether theres any scalability issues for a client to query for deals with multiple miners? @ribasushi you may have further thoughts on whether this is something that's being looked at elsewhere

ribasushi commented 4 years ago

do we also need to test whether theres any scalability issues for a client to query for deals with multiple miners?

@jnthnvctr I think it doesn't matter at this stage...

yusefnapora commented 4 years ago

@ribasushi, @jnthnvctr I've made a start on this (will push soon), and just want to check in with you guys about what specifically we're measuring.

As I understand it, we're mostly worried about ClientListDeals falling over if a client has too many active deals. In other words, the specifics of the deal (whether it's online or offline, etc) don't matter as much as the total number of deals per client. Is this correct?

I've made a start on the offline deal flow, but may switch to online deals if the distinction isn't important.

ribasushi commented 4 years ago

@yusefnapora I proposed on slack that we meet for 10 mins to sync up via a higher bandwidth channel. If you do not have availability I will try to form thought as a comment here.

yusefnapora commented 4 years ago

thanks for the sync meeting @ribasushi & @jnthnvctr. I just wanted to summarize here:

So far I've tried proposing 8000 deals using the offline deal flow, and the client can fetch them with ClientListDeals without any issues. I hit gas limit errors on the miner side when trying to activate the deals though:

Jul 28 17:07:19.458837  INFO    61.9329s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.457Z WARN    vm      vm/runtime.go:144       VM.Call failure: not enough gas: used=14966134, available=14966134 (RetCode=7): {"req_id": "04bdfdc3"}
Jul 28 17:07:19.458928  INFO    61.9331s      ERROR << miners[000] (0d9cc7) >>     github.com/filecoin-project/lotus/chain/vm.(*Runtime).chargeGasInternal      {"req_id": "04bdfdc3"}
Jul 28 17:07:19.458961  INFO    61.9331s      ERROR << miners[000] (0d9cc7) >>         /go/pkg/mod/github.com/filecoin-project/lotus@v0.4.3-0.20200727232759-291d2fe2ded7/chain/vm/runtime.go:555       {"req_id": "04bdfdc3"}
Jul 28 17:07:19.458976  INFO    61.9331s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.457Z WARN    vm      vm/runtime.go:367       vmctx send failed: to: t04, method: 4: ret: [], err: not enough gas: used=14966134, available=14966134 (RetCode=7)      {"req_id": "04bdfdc3"}
Jul 28 17:07:19.458990  INFO    61.9331s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.457Z WARN    vm      vm/runtime.go:315       Abortf: failed to enroll cron event     {"req_id": "04bdfdc3"}
Jul 28 17:07:19.459038  INFO    61.9332s      ERROR << miners[000] (0d9cc7) >> 2020-07-28T17:07:19.457Z WARN    vm      vm/runtime.go:144       VM.Call failure: failed to enroll cron event (RetCode=7):       {"req_id": "04bdfdc3"}

Going to try adjusting the BlockGasLimit as suggested by @Kubuxu on slack and see if that helps.

FWIW, the error happens at the deal StartEpoch, after the offline data has been successfully imported. Both the miner and the client have > 100M FIL in their wallets, so I don't think either is strapped for cash :)

My plan for today is to try to figure out the gas limit so the deals can succeed. If I can't figure that out easily, I'll just push the StartEpoch far into the future so they don't fail and just keep proposing deals until the client chokes.

ribasushi commented 4 years ago

I'll just push the StartEpoch far into the future so they don't fail and just keep proposing deals until the client chokes.

This is actually the correct approach, as the current design calls for deal proposals to start 2 months in the future.