bheisler / iai

Experimental one-shot benchmarking/profiling harness for Rust
Apache License 2.0
586 stars 24 forks source link

Cachegrind failure in non-privileged docker container (e.g. CircleCI) #15

Open mplanchard opened 3 years ago

mplanchard commented 3 years ago

I've got an issue where my benchmarks are not failing locally (ubuntu 20.04) but are failing in CI (debian buster). I've got valgrind installed there and have confirmed it's possible to run it directly, like:

cargo bench --no-run --all-features
exc=$(ls target/release/deps/ | grep -e '^iai[^.]\+$')
valgrind \
  -d \
  -v \
  --tool=cachegrind \
  --I1=32768,8,64 \
  --D1=32768,8,64 \
  --LL=8388608,16,64 \
  --cachegrind-out-file=cachegrind.out \
  "target/release/deps/$exc" \
  --iai-run 0 

However, when I run cargo bench, I get a failure like:

Running `/home/circleci/project/target/release/deps/iai-b3e03a1f9e4644b7 iai --bench`
thread 'main' panicked at 'Failed to run benchmark in cachegrind. Exit code: exit code: 1', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/iai-0.1.1/src/lib.rs:118:9

the interesting portion of the backtrace is

  15:     0x558f5004b227 - iai::run_bench::h77107b12d80265f1
  16:     0x558f5004ccd7 - iai::runner::hf910ff229467010c
  17:     0x558f500457f3 - std::sys_common::backtrace::__rust_begin_short_backtrace::hd07c56481eb04e03
  18:     0x558f500457b9 - std::rt::lang_start::{{closure}}::h11bcbb207c0366c3
  19:     0x558f50070a07 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h527fb2333ede305e
                               at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/ops/function.rs:259:13
  20:     0x558f50070a07 - std::panicking::try::do_call::h309d8aee8149866c
                               at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:379:40
  21:     0x558f50070a07 - std::panicking::try::h75a60c31fd16bfc6
                               at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:343:19
  22:     0x558f50070a07 - std::panic::catch_unwind::h1f9892423e99bc00
                               at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panic.rs:431:14
  23:     0x558f50070a07 - std::rt::lang_start_internal::hd5b67df56ca01dae
                               at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/rt.rs:51:25
  24:     0x558f50045132 - main
  25:     0x7f7cda2b009b - __libc_start_main
  26:     0x558f5004502a - _start
  27:                0x0 - <unknown>

I've tried getting more out of valgrind by running with the VALGRIND_OPTS environment variable set to "-v" and "-d -v", but it doesn't appear to be useful, in that there's still no stdout, and the target/iai directory doesn't exist.

I'd really appreciate any suggestions on how to debug this further!

Anton-4 commented 3 years ago

I've hit the same issue using earthly with a (slim) debian buster docker image.

Anton-4 commented 3 years ago

I set up a repo so anyone can reproduce this issue locally.

Anton-4 commented 3 years ago

I will look into debugging this.

Anton-4 commented 3 years ago

I was able to find the cause: setarch: failed to set personality to x86_64: Operation not permitted.

This can be fixed by using the --privileged flag with docker run. For earthly the command is as follows: earthly --allow-privileged +my_bench, and inside the Earthfile:RUN --privileged cargo bench my_benchmark.

A good addition to iai might be to print the whole command output for a non-succesful status code. That would have made it easier to find the root cause for this issue.

mplanchard commented 3 years ago

Ah, I'm so glad you were able to figure it out! For Circle, I think we can look into running this job on a machine executor instead of a docker container, since to my knowledge Circle doesn't give you a way to run a privileged container.