TACC / Lmod

Lmod: An Environment Module System based on Lua, Reads TCL Modules, Supports a Software Hierarchy
http://lmod.readthedocs.org
Other
480 stars 124 forks source link

LMOD_CACHED_LOADS causes non-zero exit code when loading a module #613

Closed smoors closed 1 year ago

smoors commented 1 year ago

Describe the bug if LMOD_CACHED_LOADS is set to yes or 1, loading a module still works but returns exit code 1, without any error message.

To Reproduce

export LMOD_CACHED_LOADS=yes
module load foss/2022a
echo $?  # returns 1

Expected behavior I would expect this to return 0, or is this intended behavior?

Desktop (please complete the following information):


Modules based on Lua: Version 8.7.14  2022-11-01 10:59 -05:00
    by Robert McLay mclay@tacc.utexas.edu

Changes from Default Configuration
----------------------------------

Name                         Where Set  Default      Value
----                         ---------  -------      -----
LFS_VERSION                  D          1.6.3        1.8.0
LMOD_CACHED_LOADS            D          no           yes
LMOD_HAVE_LUA_TERM           C          no           yes
LMOD_PACKAGE_PATH            D          nil          <empty>
LMOD_PAGER                   C          less         /usr/bin/less
LMOD_SYSTEM_DEFAULT_MODULES  D          __unknown__  <empty>
LMOD_SYSTEM_NAME             E          false        hydra-skylake-ib
LMOD_TCLSH                   C          tclsh        /usr/bin/tclsh
MODULEPATH_ROOT              C                       /data/brussel/100/vsc10009/software/lmod/lmod-8.7.14/modulefiles
PATH_TO_LUA                  C          lua          /usr/bin/lua

Where Set -> D: default, E: environment, C: configuration
             lmod_cfg: lmod_config.lua SitePkg: SitePackage StdPkg: StandardPackage
             Other: Set somewhere outside of normal locations
rtmclay commented 1 year ago

I just ran the following with Lmod 8.7.14

$ export LMOD_CACHED_LOADS=yes
$ module load gmt
$ echo $?                           
0

So this is not a general problem with using LMOD_CACHED_LOADS=yes.

Please follow the instructions included with the bug_report template to provide a working test case that shows the issue. Please use a small module tree.

rtmclay commented 1 year ago

Have you a test case for this issue or can I close this issue?

smoors commented 1 year ago

thanks for your answer. I didn't find time to create a test case yet, will try to do it this week.

smoors commented 1 year ago

I traced this down to a caching error on a modulefile that modifies the MODULEPATH:

prepend_path("MODULEPATH", pathJoin(os.getenv("YALES2_HOME"), "modules"))

I guess that makes sense. is there a way around this that does not cause this error?

/usr/bin/lua: /usr/share/lmod/lmod/libexec/Spider.lua:567: stack overflow
stack traceback:
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        ...
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:567: in function 'l_search_mpathParentT'
        /usr/share/lmod/lmod/libexec/Spider.lua:582: in function 'l_build_keepT'
        /usr/share/lmod/lmod/libexec/Spider.lua:598: in function 'buildDbT'
        /usr/share/lmod/lmod/libexec/Cache.lua:630: in function 'build'
        /usr/share/lmod/lmod/libexec/spider:461: in function 'main'
        /usr/share/lmod/lmod/libexec/spider:842: in main chunk
        [C]: ?
rtmclay commented 1 year ago

In general, there is probably no way to prevent all such errors. The spider cache is designed to walk all changes to $MODULEPATH.

It is possible that endless loop might be able to be detected. If you can give me a test case that shows this failure, I'll take a look at it.

smoors commented 1 year ago

in trying to create a minimal example, I discovered that it's actually not the MODULEPATH change itself that causes the failure, but using a environment variable YALES2_HOME that was set in the same module file:

setenv("YALES2_HOME", pathJoin(os.getenv("VSC_SCRATCH"), "yales2"))
prepend_path("MODULEPATH", pathJoin(os.getenv("YALES2_HOME"), "modules"))

of course, this can be trivially fixed:

yales2_home = pathJoin(os.getenv("VSC_SCRATCH"), "yales2")
setenv("YALES2_HOME", yales2_home)
prepend_path("MODULEPATH", pathJoin(yales2_home, "modules"))

thanks a lot for your help!

wpoely86 commented 1 year ago

@rtmclay We're a bit puzzled by this. I though Lmod only pushed changes to the environment at the end of a load and so something like:

setenv("YALES2_HOME", pathJoin(os.getenv("VSC_SCRATCH"), "yales2"))
prepend_path("MODULEPATH", pathJoin(os.getenv("YALES2_HOME"), "modules"))

should not work. But it does work (without cached loads). What am I missing?

rtmclay commented 1 year ago

When Lmod loads a module, any setenv() function pushes the value in the current environment so that your setenv() followed by a prepend_path works. This feature existed in Tmod for a long time, so I reproduced it in Lmod.

However, when Lmod is "loading" a module when performing a spider cache build any setenv() command is currently ignored. That is why your use of a local lua variable works in both a regular load and a spider load. I will change Lmod so that you can use setenv() followed by a prepend_path() like in your example w/o requiring a local lua variable.

But it is a little complicated because I'll have to restore the original environment after each module is evaluated. (AKA` loaded) in spider mode.

I'll update you when this fix is available

wpoely86 commented 1 year ago

It's not a major issue to use a local lua variable but I thought this was the only way. This trick only works for setenv?

rtmclay commented 1 year ago

It works for both setenv and pushenv

smoors commented 1 year ago

so, there are 2 issues:

to solve the second problem, I found the following workaround:

execute {cmd="ml use $VSC_SCRATCH/yales2/modules",modeA={"load"}}
execute {cmd="ml unuse $VSC_SCRATCH/yales2/modules",modeA={"unload"}}
wpoely86 commented 1 year ago

@rtmclay Is there a more elegant way to get this done? The execute statements I mean.

rtmclay commented 1 year ago

The simple checks of: 1) the directory string starts with a / 2) the directory exists and is readable by the user

The only failure that is left is that the you'll get a stack overflow because of an infinite loop. You ought to use "$LMOD_DIR/check_module_tree_syntax" instead when ever you update the module tree instead of this execute{}.

smoors commented 1 year ago

the following works too (and is probably less hacky):

if ( mode() ~= "spider" ) then
    prepend_path("MODULEPATH", pathJoin(yales2_home, "modules"))
end
rtmclay commented 1 year ago

You only want to do that if you don't want modules in $YALES2_HOME/modules directory to not be spider-able

I have modified Lmod so that setenv() (and pushenv() ) to set variables in the local environment just like the way that normal loads do.

Please test Lmod 8.7.15 when you get the chance.

smoors commented 1 year ago

I tested some more. with both Lmod 8.7.15 and 8.7.14, it works unless there's also a setenv that uses os.getenv in the same module file.

for example, the following 2 lines cannot be in the same file:

setenv("Y2_PYTHON_VERSION", os.getenv("EBVERSIONPYTHON"))
prepend_path("MODULEPATH", pathJoin(yales2_home, "modules"))
rtmclay commented 1 year ago

I am not able to see any errors or other issues when I add another setenv("name",os.getenv("NAME2")) in a module. I have added a new module file in rt/spider/mf4/Core/S/1.0.lua which has:

local yales2_home = "/unknown/a/b/c"
setenv("Y2_PYTHON_VERSION", os.getenv("EBVERSIONPYTHON"))
prepend_path("MODULEPATH", pathJoin(yales2_home, "modules"))

and EBVERSIONPYTHON is set to the string "3.7" in the environment. Please provide a bugReport example that shows the issue.

smoors commented 1 year ago

I tested this again and now the issue is gone. must have been an error in the module tree. thanks a lot for helping out!