bheisler / iai

Experimental one-shot benchmarking/profiling harness for Rust
Apache License 2.0
563 stars 35 forks source link

Unexpected Results #32

Open winksaville opened 1 year ago

winksaville commented 1 year ago

To start, I want to thank @bheisler for all of the effort you've put into criterion and iai!!

I've been experimenting with iai and really like the notion of "one-shot" measuring for low level benchmarks. I've played round with it, but sometimes I get unexpected results. This could definitely be an error on my part, that is usually the case, but I've been unable to track it down and thus I've created this issue.

Of note, I get very consistent results if I do multiple runs of a single configuration. But sometimes I run into problems when I change something or run a slightly different command. I then can get results that look wrong to me.

First off, I use an Arch Linux system for development:

$ uname -a
Linux 3900x 6.0.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 08 Dec 2022 11:03:38 +0000 x86_64 GNU/Linux

$ inxi -c
CPU: 12-Core AMD Ryzen 9 3900X (-MT MCP-) speed/min/max: 2200/2200/3800 MHz Kernel: 6.0.12-arch1-1 x86_64 Up: 9h 37m 
Mem: 6392.5/32019.1 MiB (20.0%) Storage: 465.76 GiB (136.4% used) Procs: 481 Shell: bash 5.1.16 inxi: 3.1.03 

I've created exper-iai with cargo new --lib which creates a lib with an fn add and test it_works:

$ cat -n src/lib.rs
     1  pub fn add(left: usize, right: usize) -> usize {
     2      left + right
     3  }
     4
     5  #[cfg(test)]
     6  mod tests {
     7      use super::*;
     8
     9      #[test]
    10      fn it_works() {
    11          let result = add(2, 2);
    12          assert_eq!(result, 4);
    13      }
    14  }

I then added a simple fn main:

$ cat -n src/main.rs
     1  use exper_iai::add;
     2
     3  fn main() {
     4      let r = add(3, 3);
     5      assert_eq!(r, 6);
     6      println!("{r}");
     7  }

And the iai benchmark is:

$ cat -n benches/bench_iai.rs 
     1  use iai::black_box;
     2  use exper_iai::add;
     3
     4  fn bench_iai_add() {
     5      black_box(add(2, 2));
     6  }
     7
     8  iai::main!(bench_iai_add,);

I also created gen_asm.sh so I could see the generated assembler.

$ cat -n asm/add.txt
     1  .section .text.exper_iai::add,"ax",@progbits
     2          .globl  exper_iai::add
     3          .p2align        4, 0x90
     4          .type   exper_iai::add,@function
     5  exper_iai::add:
     6
     7          .cfi_startproc
     8          lea rax, [rdi + rsi]
     9          ret
    10
    11          .size   exper_iai::add, .Lfunc_end0-exper_iai::add

$ cat -n asm/bench_iai_add.txt 
     1  .section .text.bench_iai::iai_wrappers::bench_iai_add,"ax",@progbits
     2          .p2align        4, 0x90
     3          .type   bench_iai::iai_wrappers::bench_iai_add,@function
     4  bench_iai::iai_wrappers::bench_iai_add:
     5
     6          .cfi_startproc
     7          push rax
     8          .cfi_def_cfa_offset 16
     9
    10          mov edi, 2
    11          mov esi, 2
    12          call qword ptr [rip + exper_iai::add@GOTPCREL]
    13          mov qword ptr [rsp], rax
    14
    15          mov rax, qword ptr [rsp]
    16
    17          pop rax
    18          .cfi_def_cfa_offset 8
    19          ret
    20
    21          .size   bench_iai::iai_wrappers::bench_iai_add, .Lfunc_end5-bench_iai::iai_wrappers::bench_iai_add

Also in Cargo.toml I added [profile.dev] and [profile.release] as well as I added rust-toolchain.toml to keep the toolchain consistent:

$ cat -n Cargo.toml
     1  [package]
     2  name = "exper_iai"
     3  authors = [ "Wink Saville <wink@saville.com" ]
     4  license = "MIT OR Apache-2.0"
     5  version = "0.1.0"
     6  edition = "2021"
     7
     8  # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
     9
    10  [dev-dependencies]
    11  criterion = "0.4.0"
    12  iai = "0.1.1"
    13
    14  [[bench]]
    15  name = "bench_iai"
    16  path = "benches/bench_iai.rs"
    17  harness = false
    18
    19
    20  [features]
    21
    22  # From: https://doc.rust-lang.org/cargo/reference/profiles.html#dev
    23  [profile.dev]
    24  opt-level = 0
    25  debug = true
    26  #split-debuginfo = '...'  # Platform-specific.
    27  debug-assertions = true
    28  overflow-checks = true
    29  lto = false
    30  panic = 'unwind'
    31  incremental = true
    32  codegen-units = 256
    33  rpath = false
    34
    35  # From: https://doc.rust-lang.org/cargo/reference/profiles.html#release
    36  [profile.release]
    37  opt-level = 3
    38  debug = false
    39  #split-debuginfo = '...'  # Platform-specific.
    40  debug-assertions = false
    41  overflow-checks = false
    42  lto = false
    43  panic = 'unwind'
    44  incremental = false
    45  codegen-units = 1
    46  rpath = false

$ cat -n rust-toolchain.toml 
     1  [toolchain]
     2  channel = "stable"
     3  #channel = "nightly"

Running main and test work as expected:

$ cargo run
   Compiling exper_iai v0.1.0 (/home/wink/prgs/rust/myrepos/exper-iai)
    Finished dev [unoptimized + debuginfo] target(s) in 0.33s
     Running `target/debug/exper_iai`
6
$ cargo test
   Compiling autocfg v1.1.0
   Compiling proc-macro2 v1.0.47
...
   Compiling tinytemplate v1.2.1
   Compiling criterion v0.4.0
   Compiling exper_iai v0.1.0 (/home/wink/prgs/rust/myrepos/exper-iai)
    Finished test [unoptimized + debuginfo] target(s) in 8.58s
     Running unittests src/lib.rs (target/debug/deps/exper_iai-854898c18c69642d)

running 1 test
test tests::it_works ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/debug/deps/exper_iai-6092fd66897760dc)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests exper_iai

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

And running cargo bench and yields a more or less expected result:

$ cargo clean
$ cargo bench
   Compiling autocfg v1.1.0
   Compiling proc-macro2 v1.0.47
...
   Compiling tinytemplate v1.2.1
   Compiling criterion v0.4.0
    Finished bench [optimized] target(s) in 20.33s
     Running unittests src/lib.rs (target/release/deps/exper_iai-e0c596df81667934)

running 1 test
test tests::it_works ... ignored

test result: ok. 0 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/release/deps/exper_iai-bbf641b3842b4eea)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   8
  L1 Accesses:                   12
  L2 Accesses:                    2
  RAM Accesses:                   2
  Estimated Cycles:              92

And here are two more runs of just bench_iai showing the expected consistency:

$ cargo bench --bench bench_iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   8 (No change)
  L1 Accesses:                   12 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   2 (No change)
  Estimated Cycles:              92 (No change)

$ cargo bench --bench bench_iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   8 (No change)
  L1 Accesses:                   12 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   2 (No change)
  Estimated Cycles:              92 (No change)

Here is a my first unexpected result, if I change the command line, adding taskset -c 0, I wouldn't expect significantly different results, but Instructions is 0, that is unexpected:

$ taskset -c 0 cargo bench --bench bench_iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   0 (-100.0000%)
  L1 Accesses:      18446744073709551615 (+153722867280912908288%)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   3 (+50.00000%)
  Estimated Cycles:             114 (+23.91304%)

$ taskset -c 0 cargo bench --bench bench_iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   0 (No change)
  L1 Accesses:      18446744073709551615 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   3 (No change)
  Estimated Cycles:             114 (No change)

$ taskset -c 0 cargo bench --bench bench_iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/bench_iai.rs (target/release/deps/bench_iai-e75a6910d1576500)
bench_iai_add
  Instructions:                   0 (No change)
  L1 Accesses:      18446744073709551615 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   3 (No change)
  Estimated Cycles:             114 (No change)

But a bigger problem is if rename bench_iai.rs to iai.rs and the bench within that file from bench_iai_add to iai_add:

$ cat -n benches/iai.rs 
     1  use iai::black_box;
     2  use exper_iai::add;
     3  
     4  fn iai_add() {
     5      black_box(add(2, 2));
     6  }
     7  
     8  iai::main!(iai_add,);

And then I make the necessary changes to get things working again, see exper-iai branch rename-bench_iai_add-to-iai_add. In my opinion only "labels" have changed and none of the actual assembler code has changed.

But now I get really unexpected results, I switch branches and then clean and rerun bench and now Instructions has a value of 22:

$ git switch main
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
$ git switch rename-bench_iai_add-to-iai-add 
Switched to branch 'rename-bench_iai_add-to-iai-add'
$ cargo clean
$ cargo bench
   Compiling autocfg v1.1.0
   Compiling proc-macro2 v1.0.47
...
   Compiling tinytemplate v1.2.1
   Compiling criterion v0.4.0
    Finished bench [optimized] target(s) in 20.60s
     Running unittests src/lib.rs (target/release/deps/exper_iai-e0c596df81667934)

running 1 test
test tests::it_works ... ignored

test result: ok. 0 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/release/deps/exper_iai-bbf641b3842b4eea)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/iai.rs (target/release/deps/iai-1d6df879cc9849e1)
iai_add
  Instructions:                  22
  L1 Accesses:                   34
  L2 Accesses:                    2
  RAM Accesses:                   2
  Estimated Cycles:             114

$ cargo bench --bench iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/iai.rs (target/release/deps/iai-1d6df879cc9849e1)
iai_add
  Instructions:                  22 (No change)
  L1 Accesses:                   34 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   2 (No change)
  Estimated Cycles:             114 (No change)

$ cargo bench --bench iai
    Finished bench [optimized] target(s) in 0.02s
     Running benches/iai.rs (target/release/deps/iai-1d6df879cc9849e1)
iai_add
  Instructions:                  22 (No change)
  L1 Accesses:                   34 (No change)
  L2 Accesses:                    2 (No change)
  RAM Accesses:                   2 (No change)
  Estimated Cycles:             114 (No change)