celerity / celerity-runtime

High-level C++ for Accelerator Clusters
https://celerity.github.io
MIT License
141 stars 18 forks source link

CI: Core pinning for benchmark runs to reduce variance #226

Closed PeterTh closed 11 months ago

PeterTh commented 11 months ago

We noticed that there are extremely large (~100%) differences in the runtime of specific benchmarks on the current CI target system, which are related to thread scheduling. In order to mitigate this issue, the CI benchmark script now pins the benchmarks to a small subset of cores.

As the following data of 2 benchmark runs on master and 2 on this branch shows, this both reduces overall runtimes and, more importantly, reduces the std deviation overall and in particular for the system benchmarks.

std_devs times

The initial idea for this was to use mpirun core pinning, but this fails due to HWLOC not working within the CI docker image. This now uses taskset, since that has equivalent results.

PeterTh commented 11 months ago

Yes, this is a bit regrettable, but on purpose. Using more increases the variance on gpuc2 by a noticeable amount.

Since that is an issue specific to gpuc2 (which is a pretty old architecture) we probably don't need to investigate that in detail, or change the recommendation.

PeterTh commented 11 months ago

I didn't want it to run general CI since that would occupy gpuc2 (and there are 0 relevant general CI changes), and I wanted to be able to quickly run some more benchmarks on that.

The benchmark report is also disabled as a consequence of that, but it would be meaningless anyway. I guess it would look good though, Celerity getting that much faster :P

fknorr commented 11 months ago

Ah, I didnt see the [skip ci] and was wondering if there was another CI hiccup :+1: