cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.06k stars 4.24k forks source link

reduce impact of OPENMP #45040

Open VinInn opened 1 month ago

VinInn commented 1 month ago

as detailed in #44923 cmssw links openmp gcc library and at some point threads are spawn in number of ncpu-1 (for each cmssw thread)

setting export OMP_NUM_THREADS=1 will make openmp not to spawn any thread (as gomp uses the main thread as thread-0)

I suggest to make export OMP_NUM_THREADS=1 default in scram (or to call the equivalnet API function in cmsRun)

cmsbuild commented 1 month ago

cms-bot internal usage

cmsbuild commented 1 month ago

A new Issue was created by @VinInn.

@sextonkennedy, @rappoccio, @Dr15Jones, @smuzaffar, @makortel, @antoniovilela can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 1 month ago

assign core

cmsbuild commented 1 month ago

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

fwyzard commented 1 month ago

For the record, on the new Bergamo machines (2x256 cores), HLT fails with CMSSW_14_0_7_MULTIARCHS because

libgomp: Thread creation failed: Resource temporarily unavailable

Now I'm testing if export OMP_NUM_THREADS=1 works around the problem.

fwyzard commented 1 month ago

Now I'm testing if export OMP_NUM_THREADS=1 works around the problem.

It does:

Running 4 times over 10300 events with 8 jobs, each with 64 threads, 48 streams and 1 GPUs
  1312.8 ±   0.4 ev/s (10000 events, 99.1% overlap)
smuzaffar commented 1 month ago

https://github.com/cms-sw/cmsdist/pull/9207 proposes to set OMP_NUM_THREADS=1 for cmssw runtime env

fwyzard commented 1 month ago

While it is fine as a temporary measure for the HLT, I'm not sure this is what we should do as a general setting.

A trivial example of what can go wrong:

$ nproc
512

$ export OMP_NUM_THREADS=1

$ nproc
1
mmusich commented 1 month ago

https://github.com/cms-sw/cmsdist/pull/9207 proposes to set OMP_NUM_THREADS=1 for cmssw runtime env

mmmh, there are certainly applications (e.g. millepede alignment) that need more than 1 thread, see e.g.:

https://github.com/cms-sw/cmssw/blob/f1b02548e1dfa1269f96ce117a39857e8d50cd89/Alignment/MillePedeAlignmentAlgorithm/test/test_mille.py#L153

@cms-sw/alca-l2 @henriettepetersen @TomasKello FYI

smuzaffar commented 1 month ago

may be call to omp_set_num_threads() within cmsRun process

-  OMP_NUM_THREADS (if present) specifies initially the number of threads;
-  calls to omp_set_num_threads() override the value of OMP_NUM_THREADS;
-  the presence of the num_threads clause overrides both other values.
VinInn commented 1 month ago

In my opinion NO application should rely on default settings and should properly taylor the number of threads to the task in hands and the environment (as done for edm). Therefore setting export OMP_NUM_THREADS=1 is perfectly fine in my opinion. btw: do not understand what nproc has to to with openmp in any case

[innocent@devfu-c2b04-44-01 ~]$ nproc --all
256
[innocent@devfu-c2b04-44-01 ~]$ export OMP_NUM_THREADS=1
[innocent@devfu-c2b04-44-01 ~]$ nproc --all
256
[innocent@devfu-c2b04-44-01 ~]$ nproc
1
VinInn commented 1 month ago

For those interested to what others do w/r/t this issue please have a look to this https://github.com/joblib/threadpoolctl/blob/master/README.md

this is for python, still something similar in C++ could become useful in cmssw

VinInn commented 1 month ago

instead of nproc

taskset -pc $$
pid 2177350's current affinity list: 0-255
taskset -p $$
pid 2177350's current affinity mask: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

is most probably more reliable (parsing the list/mask is left as an exercise to the reader)

makortel commented 1 month ago

From the framework perspective

Regarding XGBoost

mmusich commented 1 month ago

There seems to exist a pat::XGBooster abstraction already in PhysicsTools/PatAlgos, that does the proper XGBoosterSetParam(..., "nthread", "1") call

  • The abstraction should really be moved to its own package, do decouple it and PhysicsTools/PatAlgos dependencies
  • One option for PhotonXGBoostEstimator would be to use this XGBooster abstraction

This seemed relatively easy to do so I gave it a go at https://github.com/cms-sw/cmssw/pull/45085. In case the GBRForest implementation is already underway we can discard it.