better support for big.LITTLE processors

I was doing some benchmarks for buffer sizes while doing a specific encryption task when I noticed some WILD variability in benchmarks. Some variability is totally normal, but this was pretty drastic.

The test encrypted a large file using different buffer sizes (1K->128MB).

My immediate thought was because this an 13th Gen Intel processor running un Linux, it's probably just scheduling the benchmarks across the performance (P-cores) and efficiency (E-cores) cores at random (or based on thermals, etc.).

Here are two runs:

Run 1

| Size | Throughput (MiB/s) | | ----- | ------------------ | | 1KB | 343.47 | | 2KB | 389.98 | | 4KB | 516.36 | | 8KB | 814.32 | | 16KB | 636.23 | | 32KB | 815.35 | | 64KB | 1.0531GiB/s | | 128KB | 888.02 | | 256KB | 879.91 | | 512KB | 909.79 | | 1MB | 1.4063GiB/s | | 2MB | 992.26 | | 4MB | 912.35 | | 8MB | 935.68 | | 16MB | 774.32 | | 32MB | 1.0185GiB/s | | 64MB | 663.76 | | 128MB | 865.33 |

Run 2

| Size | Throughput (MiB/s) | | ----- | ------------------ | | 1KB | 539.31 | | 2KB | 431.18 | | 4KB | 487.79 | | 8KB | 594.78 | | 16KB | 1.1682GiB/s | | 32KB | 752.63 | | 64KB | 779.33 | | 128KB | 796.46 | | 256KB | 1.2271GiB/s | | 512KB | 816.28 | | 1MB | 932.12 | | 2MB | 1.3587GiB/s | | 4MB | 792.06 | | 8MB | 1.3609GiB/s | | 16MB | 1.3672GiB/s | | 32MB | 707.43 | | 64MB | 680.69 | | 128MB | 1.1341GiB/s |

You can see for instance in Run 1, 16KB ran on an E-core (636 MiB/s) while on Run 2 it ran on a P-core (1 GiB/s).

Now if I pin the benchmark to the P-cores alone using taskset we get:

Pinned to P-cores via `taskset`

| Size | Throughput (MiB/s) | | ----- | ------------------ | | 1KB | 525.56 | | 2KB | 693.22 | | 4KB | 855.16 | | 8KB | 1.0266GiB/s | | 16KB | 1.1832GiB/s | | 32KB | 1.2746GiB/s | | 64KB | 1.3139GiB/s | | 128KB | 1.3787GiB/s | | 256KB | 1.3885GiB/s | | 512KB | 1.3932GiB/s | | 1MB | 1.4098GiB/s | | 2MB | 1.3845GiB/s | | 4MB | 1.3864GiB/s | | 8MB | 1.3965GiB/s | | 16MB | 1.3678GiB/s | | 32MB | 1.2770GiB/s | | 64MB | 1.2423GiB/s | | 128MB | 1.1526GiB/s |

I'm not sure what the best route forward would be, other than maybe just warning the user that they're using a big.LITTLE CPU and benchmarks could be wildly inaccurate?

There are ways to pin the benchmarks to specific cores in code, but that feels like something the user should be in charge of, as there may be times I want to run a benchmark on E-cores versus P-cores?

I could see some CLI flags exposing this behavior for an advanced user perhaps.

bheisler / criterion.rs

better support for big.LITTLE processors #735