bheisler / criterion.rs

Statistics-driven benchmarking library for Rust
Apache License 2.0
4.31k stars 292 forks source link

Benchmarks show degraded performance for functions called through FFI #691

Closed Janmajayamall closed 1 year ago

Janmajayamall commented 1 year ago

I am writing rust bindings for a c++ library. To test performance, I have written a few benchmarks for the bindings. However, the benchmarks show that performance of c++ functions called through ffi is a lot worse (~2x) than what their benchmarks in c++ say. I initially suspected that this is due to calling function through ffi, but this isn't true. I recorded time taken by a specific function the simple way, that is by running it a bunch of times and measuring time before and after. The time per iteration matches c++ benchmarks.

You can try this yourself (only works on intel machine). Clone the repository and run cargo bench modulus/mul_mod_vec/n=32768/logq=60. On my machine it takes 40.520 µs. Function that records the time taken by same function with same inputs the simple way lives in main.rs. After running cargo run --release, it prints Time: 27.855µs on my machine. 27.855µs is close to what c++ benchmarks display for the same function ie ~30µs.

It will be helpful if someone can help me figure out the reason for the difference in values?

Thanks!

Janmajayamall commented 1 year ago

The issue was on my end and was caused by setting batch size as BatchSize::SmallInput. With BatchSize::LargeInput benchmark results are as expected.