Closed PeterTh closed 11 months ago
Yes, this is a bit regrettable, but on purpose. Using more increases the variance on gpuc2
by a noticeable amount.
Since that is an issue specific to gpuc2
(which is a pretty old architecture) we probably don't need to investigate that in detail, or change the recommendation.
I didn't want it to run general CI since that would occupy gpuc2
(and there are 0 relevant general CI changes), and I wanted to be able to quickly run some more benchmarks on that.
The benchmark report is also disabled as a consequence of that, but it would be meaningless anyway. I guess it would look good though, Celerity getting that much faster :P
Ah, I didnt see the [skip ci]
and was wondering if there was another CI hiccup :+1:
We noticed that there are extremely large (~100%) differences in the runtime of specific benchmarks on the current CI target system, which are related to thread scheduling. In order to mitigate this issue, the CI benchmark script now pins the benchmarks to a small subset of cores.
As the following data of 2 benchmark runs on master and 2 on this branch shows, this both reduces overall runtimes and, more importantly, reduces the std deviation overall and in particular for the system benchmarks.
The initial idea for this was to use
mpirun
core pinning, but this fails due to HWLOC not working within the CI docker image. This now usestaskset
, since that has equivalent results.