Open VinInn opened 1 month ago
cms-bot internal usage
A new Issue was created by @VinInn.
@sextonkennedy, @rappoccio, @Dr15Jones, @smuzaffar, @makortel, @antoniovilela can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign core
New categories assigned: core
@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks
For the record, on the new Bergamo machines (2x256 cores), HLT fails with CMSSW_14_0_7_MULTIARCHS
because
libgomp: Thread creation failed: Resource temporarily unavailable
Now I'm testing if export OMP_NUM_THREADS=1
works around the problem.
Now I'm testing if
export OMP_NUM_THREADS=1
works around the problem.
It does:
Running 4 times over 10300 events with 8 jobs, each with 64 threads, 48 streams and 1 GPUs
1312.8 ± 0.4 ev/s (10000 events, 99.1% overlap)
https://github.com/cms-sw/cmsdist/pull/9207 proposes to set OMP_NUM_THREADS=1
for cmssw runtime env
While it is fine as a temporary measure for the HLT, I'm not sure this is what we should do as a general setting.
A trivial example of what can go wrong:
$ nproc
512
$ export OMP_NUM_THREADS=1
$ nproc
1
https://github.com/cms-sw/cmsdist/pull/9207 proposes to set OMP_NUM_THREADS=1 for cmssw runtime env
mmmh, there are certainly applications (e.g. millepede alignment) that need more than 1 thread, see e.g.:
@cms-sw/alca-l2 @henriettepetersen @TomasKello FYI
may be call to omp_set_num_threads()
within cmsRun process
- OMP_NUM_THREADS (if present) specifies initially the number of threads;
- calls to omp_set_num_threads() override the value of OMP_NUM_THREADS;
- the presence of the num_threads clause overrides both other values.
In my opinion NO application should rely on default settings and should properly taylor the number of threads to the task in hands and the environment (as done for edm). Therefore setting export OMP_NUM_THREADS=1
is perfectly fine in my opinion.
btw: do not understand what nproc
has to to with openmp
in any case
[innocent@devfu-c2b04-44-01 ~]$ nproc --all
256
[innocent@devfu-c2b04-44-01 ~]$ export OMP_NUM_THREADS=1
[innocent@devfu-c2b04-44-01 ~]$ nproc --all
256
[innocent@devfu-c2b04-44-01 ~]$ nproc
1
For those interested to what others do w/r/t this issue please have a look to this https://github.com/joblib/threadpoolctl/blob/master/README.md
this is for python, still something similar in C++ could become useful in cmssw
instead of nproc
taskset -pc $$
pid 2177350's current affinity list: 0-255
taskset -p $$
pid 2177350's current affinity mask: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
is most probably more reliable (parsing the list/mask is left as an exercise to the reader)
From the framework perspective
cmsRun
Regarding XGBoost
xgboost
external could be built with USE_OPENMP=OFF
https://github.com/dmlc/xgboost/blob/21d95f3d8f23873a76f8afaad0fee5fa3e00eafe/CMakeLists.txt#L30pat::XGBooster
abstraction already in PhysicsTools/PatAlgos
, that does the proper XGBoosterSetParam(..., "nthread", "1")
call
PhysicsTools/PatAlgos
dependencies (@cms-sw/ml-l2 @cms-sw/reconstruction-l2 @cms-sw/xpog-l2)PhotonXGBoostEstimator
would be to use this XGBooster
abstractionGBRForest
could be used? (https://github.com/cms-sw/cmssw/issues/44923#issuecomment-2129982095 https://github.com/cms-sw/cmssw/issues/44542#issuecomment-2068155042) In which case all present and future users of XGBoost should really use the GBRForest
for the inference?There seems to exist a pat::XGBooster abstraction already in PhysicsTools/PatAlgos, that does the proper XGBoosterSetParam(..., "nthread", "1") call
- The abstraction should really be moved to its own package, do decouple it and PhysicsTools/PatAlgos dependencies
- One option for PhotonXGBoostEstimator would be to use this XGBooster abstraction
This seemed relatively easy to do so I gave it a go at https://github.com/cms-sw/cmssw/pull/45085. In case the GBRForest
implementation is already underway we can discard it.
as detailed in #44923 cmssw links openmp gcc library and at some point threads are spawn in number of ncpu-1 (for each cmssw thread)
setting
export OMP_NUM_THREADS=1
will make openmp not to spawn any thread (as gomp uses the main thread as thread-0)I suggest to make
export OMP_NUM_THREADS=1
default in scram (or to call the equivalnet API function in cmsRun)