EESSI / software-layer

Software layer of the EESSI project
https://eessi.github.io/docs/software_layer
GNU General Public License v2.0
20 stars 43 forks source link

Minimise initialisation script to only initialise Lmod #261

Open ocaisa opened 1 year ago

ocaisa commented 1 year ago

With the compat layer, we can have multiple OSes side-by-side without issue. The only reason we can't right now is because of our init script since it is adding compat layer paths to the environment. I think we should move almost all our init script into a module file, and indeed an initial attempt has already been made in #68.

In the past this was tricky because we needed archspec to determine the architecture, but if our bash only approach (#187) is reliable this would no longer be a restriction (we can use bash from the host). And even deciding the architecture could be done as part of the LMOD_RC in Lua (which may even make us resilient against exported environment variables in a job context).

Then for each compat layers, you can have a gateway module (with an Lmod family so you can't have two loaded at once) to give you access to a particular compat layer (and associated EB stack).

As a major plus, this would also mean we would be able leverage Lmod to work in different shell environments.

The only thing remaining in the initialisation script would be initialising Lmod. It wouldn't even matter which version as we should be able to switch version to match the pilot version as part of the compat module file (if this was considered necessary).

ocaisa commented 1 year ago

This would also give us a way of documenting the compat layer (and important differences it may have)

ocaisa commented 1 year ago

In terms of using old software on new hardware, probably the archdetect bash script will have to return multiple values to try, something like x86_64/intel/skylake_avx512:x86_64/intel/haswell:x86_64/generic, and if the path exists under /cvmfs/pilot.eessi-hpc.org/versions/XXX/software/linux it uses it, otherwise it tries the next option. That way we can always use the latest version of archdetect.

trz42 commented 1 year ago

I wonder what use cases are not possible with the current approach.

ocaisa commented 1 year ago

Currently you cannot (automatically) reverse the initialisation, the PATH entries to the compat layer remain as do the additions to the MODULEPATH. This makes it a little dangerous to source multiple compat layers as tools from one may leak into another.

This means in general that we can't reliably mix and match software from different compat layers. Module files (and the use of an Lmod family) would provide a safe and documented way to do this. With that approach we would no longer need to build old software with new compat layers (and deal with the fallout), we can provide a global view which includes all compat layers.

ocaisa commented 1 year ago

@boegel I just realised that this is even more important. Right now, it is not possible to initialise a different version of EESSI if you have already initialised a version. Our current init scripts assume certain actions if EESSI-related envvars are set in the environment. This means unless you know which variables to unset, you cannot escape the existing EESSI version (so if EESSI were your default environment like it is for me in Magic Castle, you cannot easily try another version).

ocaisa commented 11 months ago

The Lmod feature source_sh() may be enough for us to figure out the architecture. A simple script containing

export ARCHITECTURE_PATH=$(/home/ocaisa/software-layer/init/eessi_archdetect.sh cpupath 2> /dev/null)

can be created, and a module file for this can be created containing:

source_sh("bash", "/home/ocaisa/test_lmod/script.sh")

which will set ARCHITECTURE_PATH via Lmod as a result.