TACC / Lmod

Lmod: An Environment Module System based on Lua, Reads TCL Modules, Supports a Software Hierarchy
http://lmod.readthedocs.org
Other
489 stars 128 forks source link

Lmod exception with .version file in TCL module file #614

Closed nrcfieldsa closed 1 year ago

nrcfieldsa commented 1 year ago

Describe the bug Lmod in our installation recently has an issue loading certain TCL module files that contain a setenv command and makes use of a TCL module_name/.version file.

To Reproduce

ubuntu2004:~$ export MODULEPATH="/path/to_tcl/modules/modulefiles"
ubuntu2004:~$ module --version

Modules based on Lua: Version 8.7.2  2022-05-04 13:42 -05:00
    by Robert McLay mclay@tacc.utexas.edu

# TCL file included that implements site-specific logic for paths and OS support:
ubuntu2004:~$ grep setenv /path/to_tcl/modules/modulefiles/module/0.1
setenv MODULE_DEPOT     $depot
setenv MODULE_TARGET            $MODULE_TARGET

-ubuntu2004:~$ module avail
invalid command name "setenv"
    while executing
"setenv MODULE_DEPOT    $depot"
    (file "/path/to_tcl/modules/modulefiles/depot/0.1" line 65)
    invoked from within
"source /path/to_tcl/modules/modulefiles/depot/0.1"
    (file "/path/to_tcl/share/modules/modulefiles/picard-tools/.version" line 7)
    invoked from within
"source $mRcFile"
    (procedure "main" line 15)
    invoked from within
"main $fn"
    (file "/opt/common/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.4.0/lmod-8.7.2-xdeyxleayxxciid45dpl6h4sv4qhpicy/lmod/8.7.2/libexec/RC2lua.tcl" line 137)
Lmod has detected the following error:  Unable to parse:
"/path/to_tcl/modules/modulefiles/picard-tools/.version". Aborting!

The contents of the TCL modulefile that causes the issues is as follows:

ubuntu2004:~$ cat /path/to_tcl/modules/modulefiles/picard-tools/.version
#%Module1.0#####################################################################
##
##
# for Tcl script use only
#

source /path/to_tcl/modules/modulefiles/depot/0.1

switch -regexp -- $MODULE_TARGET {
    {U18} {
        set ModulesVersion      picard-2.19.0-gcc-7.3.0-twenaxi
    }
    {C7} {
        set ModulesVersion      picard-2.19.0-gcc-8.3.0-37schxi
    }

    default {
        set ModulesVersion      2.3.8
    }
}

#
################################################################################

There is a chance that the module files can be re-written or start using spack generated module files. However, that doesn't address the potential issue with .version file using TCL commands.

Issue can be reproduced when using bug_report_template.sh with contents:

#!/bin/bash
# -*- shell-script -*-

. $LMOD_ROOT/lmod/init/bash

export MODULEPATH=$PWD/my_modules
module avail

The problematic modules have been placed in:

ubuntu2004:~/work/Lmod-bugReport/my_modules/picard-tools$ ls -al
total 6
drwxrwxr-x 2 usr001 nrc 4096 Dec  7 19:31 .
drwxr-xr-x 7 usr001 nrc 4096 Dec  7 19:17 ..
-rw-r--r-- 1 usr001 nrc  525 Dec 10  2021 .version
-rw-r--r-- 1 usr001 nrc  797 Dec 10  2021 1.119
-rw-r--r-- 1 usr001 nrc  797 Dec 10  2021 2.3.8
lrwxrwxrwx 1 usr001 nrc  102 Oct 24  2019 picard-2.19.0-gcc-7.3.0-twenaxi -> /opt/U18/spack/share/spack/modules/linux-ubuntu18.04-x86_64/picard-2.19.0-gcc-7.3.0-twenaxi
lrwxrwxrwx 1 usr001 nrc   97 Oct 24  2019 picard-2.19.0-gcc-8.3.0-37schxi -> /opt/C7/spack/share/spack/modules/linux-centos7-x86_64/picard-2.19.0-gcc-8.3.0-37schxi

ubuntu2004:~/work/Lmod-bugReport$ cd my_modules/depot/; ls -al|sed -e 's/sits001/usr001/g' -e 's/nrc_its/nrc/'
total 3
drwxr-xr-x 2 usr001 nrc 4096 Dec  7 19:43 .
drwxr-xr-x 7 usr001 nrc 4096 Dec  7 19:17 ..
-rw-r--r-- 1 usr001 nrc 1904 Dec  1 23:53 0.1

Error output from bug_report:

ubuntu2004:~/work/Lmod-bugReport$ env -i LMOD_ROOT=$LMOD_ROOT USER=$USER ./bug_re
port_template.sh
invalid command name "setenv"
    while executing
"setenv MODULE_DEPOT    $depot"
    (file "/home/usr001/work/Lmod-bugReport/my_modules/depot/0.1" line 65)
    invoked from within
"source /home/usr001/work/Lmod-bugReport/my_modules/depot/0.1"
    (file "/home/usr001/Lmod-bugReport/my_modules/picard-tools/.version" line
7)
    invoked from within
"source $mRcFile"
    (procedure "main" line 15)
    invoked from within
"main $fn"
    (file "/opt/common/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-9.4.0/lmod-8.7.2-xdeyxleayxxciid45dpl6h4sv4qhpicy/lmod/lmod/libexec/RC2lua.tcl" line 137)

------------------- /home/usr001/Lmod-bugReport/my_modules -------------------
   Compiler/gcc/10/boost/1.9         picard-tools/picard-2.19.0-gcc-7.3.0-twenaxi
   Compiler/gcc/10/mpich/10.0        picard-tools/picard-2.19.0-gcc-8.3.0-37schxi
   Core/gcc/10.0                     picard-tools/1.119
   MPI/gcc/10/mpich/10/phdf5/1.14    picard-tools/2.3.8                           (D)
   depot/0.1

  Where:
   D:  Default Module

If the avail list is too long consider trying:

"module --default avail" or "ml -d av" to just list the default modules.
"module overview" or "ml ov" to display the number of modules for each name.

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

Expected behavior Most TCL modules appear to work OK with Lmod. It is expected that the module avail command would not fail when examining TCL module files with a setenv or switch line in them.

Setenv is said to be supported by Lmod and also appears as a supported command in environment-module documentation.

The environment modules documentation at: https://modules.readthedocs.io/en/latest/modulefile.html shows that the setenv command is valid modulefile syntax.

setenv [--set-if-undef] variable value
Set environment variable to value.  The setenv command will also change the process' environment. [..]

Desktop (please complete the following information):

Additional context The setenv command is in a module file that is shown working well in the past with environment-modules: Modules Release 4.1.1 (2018-02-17).

rtmclay commented 1 year ago

This is not supported by Lmod. The files .modulerc or .version only support the command things like:

set ModulesVersion      2.3.8

It does not support module commands.

nrcfieldsa commented 1 year ago

As it turns-out we can maintain a separate tree for Ubuntu 20 Lua modules and simply not use Lmod for older TCL modulefiles tree used in early OS releases. Unless there is anyone that needs this feature, I guess this issue can be resolved.

nrcfieldsa commented 1 year ago

Just out of curiosity would this be possible with the use of embedded TCL interpreter? As described here: https://easybuild.io/eum21/010_eum21_Lmod.pdf

Embedded TCL interpreter

  • Lmod now embeds the TCL interpreter.
    • Speeds up avail and load when there are many “.version” or “.modulerc” files.
    • It is still faster to use “.modulerc.lua” files over TCL version files.
rtmclay commented 1 year ago

That is not the issue. All .modulerc and .version files are interpreted by the file src/RC2lua.tcl. That file has to make sense of the commands that there. What does setenv mean in the context of a .version file? What do any of the other commands mean (prepend-path, etc)? All the .modulerc and .version files do is set the default version and do alias stuff. setenv etc don't apply.

There are no clear rules that I know of. It doesn't mean that they can't be defined but they haven't been.

nrcfieldsa commented 1 year ago

Priot to Lmod: The setenv line is in our site-standard "depot" module, which is sourced at the beginning of a TCL module file, to point to the correct package repository $depot on the cluster depending on the OS and cluster cell. Thus, a flat app/release namespace in share/modules/modulefiles maps to multiple install prefix paths specific to each compiler, mpi release, OS and architecture, when not held in common.

It may not be intentional to have the .version file define such an environment variable itself; rather it is being used in .version to switch to the appropriate release of multiple modulefiles for a program. It is also used in the modulefile itself to build the binary and library path prefix.

Our use case presently has been:

  1. module avail scans for modules to read in $MODULEPATH;
  2. the .version file is located by Lmod, under the picard-tools/ and other module directories where multiple versions of the module file exist;
  3. the depot/0.1 module is sourced in order to establish the prefix path to find particular installation;
  4. the setenv line in depot/0.1 is then setting an environment variable for consumption by further TCL modules and/or programs that might construct a path matching the current environment logged into.
  5. It is therefore implicitly defined inside .version; but perhaps could be defined outside of .version and just referenced there and from the modulefiles used to load software as a regular env variable not handed by environment-modules.

This mechanism is not compatible with the double-interpretation of statements in the TCL file to Lua.. As the variable being set either as regular var, or env var may now be out of scope of the calling module which sources it and when set / setenv commands are not interpreted in RC2lua.tcl.

An approach is to move more toward Lmod hierarchical modules, or have fixed directory structure with OS_REL and ARCH as done with either EB/spack, while .profile is used to select which $MODULEPATH. Thus, if you have Ubuntu20.04 for AMD Epyc (zen2) and Intel Xeon (cascadelake) nodes one with gcc and one with intel, it would be in separate module path for each and there is no logic used in the modulefiles themselves to switch between program paths. Instead you'd have multiple of the same module files in each respective path.

nrcfieldsa commented 1 year ago

This issue can be resolved as 'not a bug'. The TCL statements listed are not supported in current Lmod version. Lmod hierarchical modules fill the need I have for this task.

A work-around for any similar issue is to move the logic out of the .version and/or module file, such as into the environment of the shell.

In the future if some-one else needs more in-depth TCL module file compatibility, they could open a feature request.