fermi-lat / Likelihood

BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

gtdiffrsp creates v. large number of threads - problematic on multi-processor systems #99

Open robincorbet opened 3 years ago

robincorbet commented 3 years ago

In order to add the diffuse columns to the weekly files (which the FSSC provides as a service to the community) I was attempting to run multiple versions of gtdiffrsp on a 64 (physical) core machine. I attempted to run 49 instantiations of gtdiffrsp simultaneously, but this resulted in me exceeding the "maxproc" limit on the machine and stopped me doing anything and I got logged out and couldn't log back in until the system admins had killed the processes. The default number of allowed processes was 4096. From the error messages, it appeared that each instance of gtdiffrsp was creating 128 threads. So 128*49 = 6272 > 4096

I tried to work around this with the "limit" command to set maxproc to 16348, that was OK temporarily, but once I'd logged out from the machine I couldn't get back in. (The limit change only seemed to apply to that initial session.) Apart from trying to get the sys. admins to increase the maxproc limit on the machine, is there a way to get gtdiffrsp to create a smaller number of threads? They don't obviously seem to be used for multi-processing as far as I could tell from "top", or the time it took gtdiffrsp to run on machines with different numbers of cores.

sfegan commented 3 years ago

I had what sounds like a similar problem with some (non-Fermi) code that ended up being due to thread creation in some OMP sections in the FFTW library. In my case to fix it I set the OMP_NUM_THREADS variable to 1 to disable thread creation in OMP. Maybe that would work for you.

export OMP_NUM_THREADS=1