filecoin-project / rust-fil-proofs

Proofs for Filecoin in Rust
Other
489 stars 314 forks source link

Improve error handling for threaded verification errors #1748

Closed cryptonemo closed 5 months ago

cryptonemo commented 5 months ago

For synth porep, we switch to threaded/parallel proof verification as an optimization. This takes it a step further and now allows errors to be propagated/returned properly.

Note: There is a FIXME in the code that will be removed after it's properly tested via filecoin-ffi testing.

cryptonemo commented 5 months ago

Looks like all the CI failures are 'Infrastructure failures' and I noticed I can no longer restart failed jobs. Everything passes locally.

cryptonemo commented 5 months ago

Can't comm_d_proof_inner.validate and the comm_r last asserts above fail in a similar way if the trees are corrupted?

Yes, there are other cases. One thing (for testing) at a time. Once wired through FFI and I'm sure it's propagated properly, adding in the others are straight-forward.

cryptonemo commented 5 months ago

Good news is that everything worked wired through locally. I'll look into the other cases.

cryptonemo commented 5 months ago

The current code reports the most recent error only. A possible change would be to have a vector of errors and pushing them into a vector. Once done print the contents of the whole vector if it's non-empty.

That's incorrect. Look closer.

vmx commented 5 months ago

The current code reports the most recent error only. A possible change would be to have a vector of errors and pushing them into a vector. Once done print the contents of the whole vector if it's non-empty.

That's incorrect. Look closer.

Oh right. It's on the first error it encounters. That's probably good enough.

vmx commented 5 months ago

The current code reports the most recent error only. A possible change would be to have a vector of errors and pushing them into a vector. Once done print the contents of the whole vector if it's non-empty.

That's incorrect. Look closer.

Oh right. It's on the first error it encounters. That's probably good enough.

I re-read the code again. Now I think it collects all errors. But wow, there must be a way to make all this easier to follow.

cryptonemo commented 5 months ago

Updated the error handling on all 4 of the threaded verifiers added for synth porep proofs. Wasted some cycles looking into it and of the 4 verifiers that were made threaded for consistency, only 1 (or 2) had the most perf impact. I recall this now, but again, the parallelism was added across the board for consistency, and now we consistently propagate errors across them all. There's also still a FIXME in place (on purpose) and some additional testing is needed (via my local api and ffi branches, soon to be PRs).