TACC / Lmod

Lmod: An Environment Module System based on Lua, Reads TCL Modules, Supports a Software Hierarchy
http://lmod.readthedocs.org
Other
489 stars 126 forks source link

How to make `sbatch` jobs run Lmod setup? (ubuntu bash) #676

Closed simonLeary42 closed 10 months ago

simonLeary42 commented 10 months ago

Lmod install instructions are to add symlinks into /etc/profile.d to set up the module function and other environment variables. But a slurm batch job is non login non interactive, which means that these scripts are not run. How do you get them to run in this case?

rtmclay commented 10 months ago

This is discussed here:

https://lmod.readthedocs.io/en/latest/030_installing.html?highlight=BASH_ENV

Note that the lmod startup scripts typically set BASH_ENV and export the module and ml commands. Are these two things not working for you?

simonLeary42 commented 10 months ago

No they are working. I actually put other logic in /etc/profile.d that was meant to change the MODULEPATH environment variable. I didn't know that it set BASH_ENV, I thought that the module function was just carried over into the sbatch job from the login node. In that case, I think I can put my MODULEPATH directories instead in /etc/lmod/.modulespath, and Lmod BASH_ENV script should take care of it?

rtmclay commented 10 months ago

If BASH_ENV points to init/bash then it only controls the definition of the module and ml commands. The value of MODULEPATH can be controlled by init/profile or anything else in /etc/profile.d. Whatever the value of $MODULEPATH is when a user does sbatch job_script.sh should be the value of $MODULEPATH in the script. Or am I missing something?

simonLeary42 commented 10 months ago

We don't expect users to set $MODULEPATH in their batch scripts. What if I just set $BASH_ENV=/etc/profile?

rtmclay commented 10 months ago

No. We at TACC do not either. When the user logs in they get $MODULEPATH set by /etc/profile.d. Then the bash script inherits the environment which includes $MODULEPATH. Does that not work for you?

You could set BASH_ENV to be /etc/profile. But then it might change env vars that the user sets.

Maybe the difference is that here at TACC we pass the current environment to the job submission script. Maybe your site does not.

Also you could encourage your users to start their bash submission scripts with:

/bin/bash -l

simonLeary42 commented 10 months ago

That's normally how it works but we're a heterogeneous cluster and MODUEPATH needs to change according to CPU architecture at the start of a new job. It's only a problem for jobs on non x86 nodes.

rtmclay commented 10 months ago

Then setting BASH_ENV=/etc/profile should work. It should be the same as running the script as #!/bin/bash -l

simonLeary42 commented 10 months ago

So if Lmod makes the module function show up in bash slurm jobs by exporting BASH_ENV, how does it make the module function show up in other shells like zsh?

rtmclay commented 10 months ago

Zsh, csh/tcsh always the appropriate shell startup scripts for non-interactive shells that are used by shell scripts. Note that zsh or csh users running a bash script will use $BASH_ENV.

I am less of an expert for shells like rc, ksh and fish.

simonLeary42 commented 10 months ago

I decided I'm not going to mess with BASH_ENV and instead just put my MODULEPATH logic into the slurm prolog script. BASH_ENV=/etc/profile would ovveride the user's --export arguments if there were any conflicts.

simonLeary42 commented 10 months ago

note: /usr/share/lmod/lmod/init/profile would be much less likely to override user --export than /etc/profile

rtmclay commented 10 months ago

True.