HenrikBengtsson / parallelly

R package: parallelly - Enhancing the 'parallel' Package
https://parallelly.futureverse.org
128 stars 7 forks source link

availableCores(): Add support for HTCondor #50

Open HenrikBengtsson opened 3 years ago

HenrikBengtsson commented 3 years ago

HTCondor users, I need your help to add support for HTCondor to availableCores():

HPC schedulers such as Slurm, SGE, and Torque/PBS set environment variables that can be queried to figure out how many CPU cores the scheduler has alloted to the job. This allows the job script to to be agile to what it is allowed to run. For example, when submitting a SGE job to use four (4) cores:

$ qsub -pe smp 4 my_script.sh

the my_script.sh script knows how many cores it got by:

ncores=${NSLOTS:-1}
echo "I am allowed to use $ncores cores on this machine"

Question: How do you achieve the same on HTCondor? Does HTCondor set environment variables in a similar way, or are there other ways to query the number of cores you've been assigned?


FWIW, I tried to search the web for how to do it, but I failed to find anything useful. The closest I found is in Section 2.5.11 of https://www.mn.uio.no/ifi/tjenester/it/hjelp/beregninger/htcondor/condor-manual.pdf:

HTCondor sets several additional environment variables for each executing job that may be useful for the job to reference.

HenrikBengtsson commented 3 years ago

@fboehm, I see you're suggesting parallelly::availableCores() and you've got a vignette on how to use your qtl2pleio package with HTCondor :+1: Do you happen to know the answer to the above HTCondor-specific questions? I don't have access to HTCondor, so I need help to add support for HTCondor to availableCores().

achubaty commented 5 months ago

I don't have an HTCondor setup handy to test, the docs say:

CUBACORES GOMAXPROCS JULIA_NUM_THREADS MKL_NUM_THREADS NUMEXPR_NUM_THREADS OMP_NUM_THREADS OMP_THREAD_LIMIT OPENBLAS_NUM_THREADS ROOT_MAX_THREADS TF_LOOP_PARALLEL_ITERATIONS TF_NUM_THREADS are set to the number of cpu cores provisioned to this job. Should be at least RequestCpus, but HTCondor may match a job to a bigger slot. Jobs should not spawn more than this number of cpu-bound threads, or their performance will suffer. Many third party libraries like OpenMP obey these environment variables.

fboehm commented 5 months ago

@HenrikBengtsson - I'm so sorry that I missed this message (from 3 years ago!) until now. @lmichael107 @CHTC has a lot of HTCondor experience, and she may be able to connect us with others at U. Wisconsin-Madison who might also have answers to some of the above HT Condor questions. I regret that I'm clueless here. My past uses of HT Condor were pretty crude in the sense that I don't think I ever understood the HT Condor variables and how to integrate them with R package functions, especially when thinking about the availableCores function.