ComputeCanada / software-stack-config

8 stars 3 forks source link

`LMOD_RC` gets reset within a job #88

Closed mboisson closed 2 months ago

mboisson commented 2 months ago
[mboisson@login2 ~]$ env | grep $LMOD_RC
LMOD_RC=/cvmfs/soft.computecanada.ca/config/lmod/lmodrc.lua:/cvmfs/soft.computecanada.ca/config/lmod/lmodrc/cache_avx2_intel.lua
[mboisson@login2 ~]$ salloc --time=3:00:00 --cpus-per-task=8 --mem=0
salloc: Granted job allocation 943
salloc: Nodes node1 are ready for job
[mboisson@node1 ~]$ env | grep LMOD_RC
LMOD_RC=/cvmfs/soft.computecanada.ca/config/lmod//lmodrc.lua

This is because https://github.com/ComputeCanada/software-stack-config/blob/7ee156594ce9ec2181710d4cefbfee5e05dd6df3/profile.d/z-20-lmod.sh#L7 which on regular login gets overridden by loading default modules, but since this happens within this block https://github.com/ComputeCanada/software-stack-config/blob/7ee156594ce9ec2181710d4cefbfee5e05dd6df3/profile.d/z-20-lmod.sh#L12 and that environment variable is already defined within a job, modules are not reloaded (which is good), while LMOD_RC still gets reset.

This has minimal impact on clusters because of the CVMFS cache, but it has a large performance impact on our build infrastructure because it uses CephFS instead of CVMFS.

I think we should move https://github.com/ComputeCanada/software-stack-config/blob/7ee156594ce9ec2181710d4cefbfee5e05dd6df3/profile.d/z-20-lmod.sh#L7 within the https://github.com/ComputeCanada/software-stack-config/blob/7ee156594ce9ec2181710d4cefbfee5e05dd6df3/profile.d/z-20-lmod.sh#L12 block.

mboisson commented 2 months ago

The PR has been merged. Closing.