Closed esheldon closed 5 months ago
Yes. This is a known problem. Cc @erykoff
This could be an issue running imsim if you assume you should, for example, set nproc to the number of cores on the machine.
Before running imsim or galsim you must set all the num threads vars. I thought this would be put into imsim (galsim wants to keep the flexibility of implicit multithreading for reasons that I don't understand).
Note using 1 core vs 2 cores gave very similar run times as well, so I'm not sure what's using the extra cpu time.
https://github.com/lsst/utils/blob/main/python/lsst/utils/threads.py#L38-L57
It may be that @cwwalter is waiting for my standalone shut-it-all-down package which I'll put together during the break.
Implicit multithreading takes more resources and only occasionally improves runtime. Often it greatly increases the runtime by x10 or in some cases x100. I hates it.
Yes, that can happen if you end up oversubscribing the cores due to each proc
set by output.proc
using more than one core per proc
.
Setting OMP_NUM_THREADS to 1 does force it to use one core per proc
as set in output.nproc
Not just oversubscribing. Weird cache contention issues maybe. Unclear but it’s broken everywhere and should never be used.
When running on places like USDF with many cores we find we need to use
export OMP_NUM_THREADS=1
export NUMEXPR_MAX_THREADS=1
export OMP_PROC_BIND=false
and are telling people running at scale to use that right now. I haven't bothered on things like my laptop for testing (but maybe I should).
When @erykoff has his Rubin function ready to turn this all off, we will call that instead (too?). I think @jchiang87 may have a branch with some of this functionality if you want to try it instead. This is some basic issue with one of the libraries we use in Rubin and it also seems machine dependent.
I don't think there is more for us to do here on the imSim side. @jchiang87 do you have a comment?
Right, I think this is handled by #441.
I have
output.nproc: 1
but galsim is using 2 cores.I can get it down to 1 core by setting OMP_NUM_THREADS to 1.