filecoin-project / rust-fil-proofs

Proofs for Filecoin in Rust
Other
491 stars 314 forks source link

Regression in PoSt proofs #1048

Closed vmx closed 4 years ago

vmx commented 4 years ago

Description

There is a performance regression between master HEAD (https://github.com/filecoin-project/rust-fil-proofs/commit/0568667c442a6a54707f5d15ab78c690d7f91661) and https://github.com/filecoin-project/rust-fil-proofs/commit/7086c03e519d069d3926a14056c468e9f66da8b2). On master the PoSt proofs take longer.

When turning on logging, I identified a new "gap" where the regression seems to be:

older version:

2020-02-26T13:33:11.534 6150 high INFO storage_proofs::compound_proof > snark_proof:start
2020-02-26T13:33:11.534 6150 high INFO bellperson::groth16::prover > Bellperson 0.6.1 is being used!
2020-02-26T13:33:12.086 6150 high DEBUG bellperson::gpu::locks > Acquiring priority lock...

master HEAD (with regression):

2020-02-27T10:42:07.687 25201 high INFO storage_proofs::compound_proof > snark_proof:start
2020-02-27T10:42:07.687 25201 high INFO bellperson::groth16::prover > Bellperson 0.6.2 is being used!
2020-02-27T10:42:12.204 25201 high DEBUG bellperson::gpu::locks > Acquiring priority lock..

Between the last two log lines there is only a ~0.5s in the old version, on master it's ~4.5s.

In order to reproduce this simply run

RUST_LOG=trace cargo run --release --bin gpu-cpu-test -- --parallel false

Acceptance criteria

The gap is as small as with the old version.

Risks + pitfalls

Where to begin

porcuquine commented 4 years ago

It looks like some of the fancy stuff I had to do to reduce the circuit complexity of Poseidon added some runtime overhead in the form of synthesis time.

My first try at a fix yields this:

2020-02-28T21:58:36.745 21378 high INFO bellperson::groth16::prover > Bellperson 0.6.2 is being used!
2020-02-28T21:58:39.111 21378 high DEBUG bellperson::gpu::locks > Acquiring priority lock...

This is a 'gap' of ~2.4 seconds. I don't have a previous timing for my dev machine, but if we assume your numbers apply, then this cuts the regression about in half, from 4 down to 2 seconds.

I can do some more things of this nature and will iterate a bit.

porcuquine commented 4 years ago

And, here we go:

2020-02-29T11:31:58.708 33990 high INFO bellperson::groth16::prover > Bellperson 0.6.2 is being used!
2020-02-29T11:32:00.238 33990 high DEBUG bellperson::gpu::locks > Acquiring priority lock...

If I follow, this gets us back down to the expected gap.

For tracking purposes, this will involve:

porcuquine commented 4 years ago

After some more testing, it looks like the happy result above was a bit too optimistic. That set of changes also caused a regression from the original correctness work (for Poseidon) leading to the performance regression. I think I now have something which captures the consistent intersection of these new performance improvements while retaining previous Poseidon correctness fixes.

2020-02-29T15:19:13.660 42956 high INFO bellperson::groth16::prover > Bellperson 0.6.2 is being used!
2020-02-29T15:19:15.200 42956 high DEBUG bellperson::gpu::locks > Acquiring priority lock...

NOTE: I updated this because I used the wrong end event the first time.

This makes the gap 1.5s, which — given the above context — is probably a good place to proceed with those changes. If we decide further work needs to happen for this issue, it can come in a next round.

porcuquine commented 4 years ago

Also, bear in mind that any increase in synthesis time is traded against much fewer constraints, which should translate into faster proof-generation. It would be good to get some numbers on the total cost before/after.

vmx commented 4 years ago

If "proof-generation" is the groth_proof_time then it went down significantly. from ~51ms to ~13ms on my machine (though we are taking about ms here).

vmx commented 4 years ago

FYI: I just reran things again with the newest versions (I updated my Cargo.lock), now the gap is even bigger on that machine, ~6.2s (it was ~4.5s).

porcuquine commented 4 years ago

It's true, synthesis was slow! But this was fixed in #1174, and Window PoSt is now reported to perform much faster in practice.