MiDataInt / mdi-pipelines-framework

Stage 1 pipelines framework for the Michigan Data Interface
MIT License
0 stars 1 forks source link

Lmod modules as alternative to conda environments #16

Open wilsonte-umich opened 2 years ago

wilsonte-umich commented 2 years ago

At present, MDI Stage 1 pipelines enforce program version control using conda environments, which is stable and effective but a bit slow on first use.

Many institutional HPC servers use Lmod to provide supported program versioning, etc. This includes UM Great Lakes.

A feature that could be implemented would be to allow developers to use an 'lmod' dictionary in their pipeline.yml scripts, instead of the typical 'conda' dictionary. Thus,

conda:
  - R=4.1.0

might become:

lmod:
  - R/4.1.0

The advantages of Lmod support would be faster first launch and potentially easier communication with sys admins. The disadvantage is that it would be platform specific, decreasing portability of pipelines.

A possible solution might be to expect 'conda' but to allow 'lmod' as a bypass alternative if a user declared use of a specific server environment. An alternative might be to have Lmod fallbacks to conda entries, such that programs would be skipped in the conda environment if an associated Lmod lookup succeeded, which would remain portable while allowing streamlined, platform-specific launch. It might look like this.

conda:
  - R=4.1.0 Lmod=R/4.1.0

Implementation would not be overly difficult, relying on editing of mainly conda.pl in the pipelines framework.

wilsonte-umich commented 2 years ago

As of v0.3.0, the MDI now supports Singularity containers for running Stage 1 pipelines, which eases the burden of conda environment building for end users because that work can be done in the container image by the developer. Container download is prompted, easy, and fairly fast. Thus, Lmod support is now less obviously valuable.

The 2nd fallback syntax indicated above could still be useful to some developers, so I will leave this issue open for now.