flyvy-verifier / flyvy

An experimental framework for temporal verification based on first-order linear-time temporal logic. Our goal is to express transition systems in first-order logic and verify temporal correctness properties, including safety and liveness.
BSD 2-Clause "Simplified" License
13 stars 1 forks source link

Add scripts to produce performance benchmarks across all examples #118

Open tchajed opened 1 year ago

tchajed commented 1 year ago

We should have a way to run performance benchmarks across all the examples, which encodes the configuration for algorithms like inference and bounded model checking and gathers timing and memory usage.

I envision this being written in Rust, with a small library of functions for running the flyvy binary and gathering statistics. We'll use the same mechanism as time, but perhaps implemented directly in Rust calling getrusage directly rather than wrapping the system binary. This will make it possible to get gather memory usage over time as well by sampling getrusage.

Using getrusage isn't completely trivial because it only gets statistics for the calling process (and children), not for an arbitrary child pid. I may implement this by starting out using time and then replacing it with our own version that reports the results in JSON.

wilcoxjay commented 1 year ago

Thanks for filing this issue! I have a couple of questions.

tchajed commented 1 year ago

macOS has the same getrusage so yes, it should be the same.

You can get resource usage for all children of the calling process with RUSAGE_CHILDREN. /usr/bin/time works because it measures itself + the child it spawns, and we assume that time itself has basically no overhead. The benchmarking can also do this but then it needs to spawn a process that spawns temporal-verifier.

wilcoxjay commented 1 year ago

I see, so the problem with using getrusage with RUSAGE_CHILDREN from our hypothetical top-level rust benchmarking script is that it would measure the sum of all the benchmarks, rather than the individual benchmarks? And so the intermediate process would be responsible for calling getrusage with RUSAGE_CHILDREN so that it only measures the one benchmark.