filecoin-project / rust-fil-proofs

Proofs for Filecoin in Rust
Other
489 stars 314 forks source link

Create binaries to run phases independently #1678

Open vmx opened 1 year ago

vmx commented 1 year ago

Description

Currently the Filecoin proofs are consumed by Lotus via the FFI as single library. In addition to the library use case, the idea is to provide separate binaries for each phases (or perhaps even more fine-grained, but as a start the phases should be fine). This would serve several needs that occurred in the past:

Acceptance criteria

There are binaries that could be run in sequence the do the full lifecycle of sealing and unsealing a sector.

Risks + pitfalls

It may lead to refactorings in case the current internal APIs do not fit. Though I see it as a good thing as the APIs should already be flexible enough to make this working.

Where to begin

benchy does already partly support running certain phases only. But it's not that flexible and has known issues.

cryptonemo commented 1 year ago

To be clear, Proofs is a library and will remain that way. Binaries would be an enhancement, using the library.

vmx commented 1 year ago

To be clear, Proofs is a library and will remain that way. Binaries would be an enhancement, using the library.

Thanks for calling this out. I've changed the first paragraph to make this clearer.

RobQuistNL commented 1 year ago

This would be an awesome feature to have - it would greatly help with benchmarking seperate stages and working on improvements.

It would be very nice to have a way to validate that the result of the benchmark is correct, too. Not sure if that's inherently possible as we're skipping some steps though.

Example would be;

cargo run --bin benchy -- single-step -- ap --sectornumber 123 --size 512MiB --result /mnt/benchfiles # Generates "unsealed" sector file (/mnt/benchfiles/unsealed/123/)
cargo run --bin benchy -- single-step -- pc1 --sectornumber 123 --result /mnt/benchfiles # Uses the "unsealed" sector file from the AP step, generates the layer files in the "cache" folder (/mnt/benchfiles/cache/123/) (if i'm not mistaken, PC1 in lotus-worker stores it there too)
cargo run --bin benchy -- single-step -- pc2 --sectornumber 123 --result /mnt/benchfiles # Uses the layer files from the PC1 step, generates its files in the "cache" folder (/mnt/benchfiles/cache/123/) (if i'm not mistaken, PC2 in lotus-worker stores it there)

and so on for C1 / C2

lovel8 commented 1 year ago

@vmx It is recommended to support the following functional requirements:

  1. For performance testing
    • Added configuration support for the total number of task cycle executions to verify the stability of the program run and the stability of the calculation efficiency.
    • Add support for configuring the number of concurrent task executions in each stage, such as 30 P1s and 4 P2s concurrently, to adapt to real system resources (CPU, GPU, memory resource limitations) and achieve maximum resource utilization.
    • Add statistics log of maximum system resource usage during runtime (eg: CPU, GPU, memory) for analysis and optimization.
  2. Positioning for the problem Added support for lotus panic, benchy reruns from the problem phase (eg: P2) to reproduce and debug the problem.
vmx commented 1 year ago
  1. For performance testing

    • Added configuration support for the total number of task cycle executions to verify the stability of the program run and the stability of the calculation efficiency.

    • Add support for configuring the number of concurrent task executions in each stage, such as 30 P1s and 4 P2s concurrently, to adapt to real system resources (CPU, GPU, memory resource limitations) and achieve maximum resource utilization.

    • Add statistics log of maximum system resource usage during runtime (eg: CPU, GPU, memory) for analysis and optimization.

Those are probably out of scope. The idea is to have binaries, so that you can build those tools on-top of it. You could create your own runners that do exactly the testing that you need.

2. Positioning for the problem Added support for lotus panic, benchy reruns from the problem phase (eg: P2) to reproduce and debug the problem.

Yes, ideally it should be possible to run just a certain step on the data you already have.

vmx commented 1 year ago

Some of the requirements re-formulated as user stories:

As a storage provider I'd like to

If anyone has more, please share them here.

RobQuistNL commented 1 year ago

Yes! :)

Clear documentation (or examples) on how to run the various parts, what data they need & generate, how to pass this data through etc.

In here also the supranational updates would be easier to implement