TACC / Lmod

Lmod: An Environment Module System based on Lua, Reads TCL Modules, Supports a Software Hierarchy
http://lmod.readthedocs.org
Other
499 stars 128 forks source link

Is there an internal limit on the length of DYLD_LIBRARY_PATH for lmod/lua? #685

Closed climbfuji closed 8 months ago

climbfuji commented 9 months ago

Notes

Note sure if this is the correct place to report. I am using spack to build a very large environment on macOS, and after loading the spack-generated lua modules I get this error every time I run a module ... command:

> module list
dyld[1017]: Assertion failed: (next < &roBuffer[0x10000]), function roalloc, file DyldProcessConfig.cpp, line 653

It must have something to do with the DYLD_LIBRARY_PATH and/or DYLD_FALLBACK_LIBRARY_PATH environment variables - most likely the length. If I unset one of them (unset DYLD_LIBRARY_PATH), the module list command works again. The length of DYLD_LIBRARY_PATH is - hold on to something - 34,190 characters (which is strikingly close to 32,768, and given that I recently updated to a newer version of spack with more dependencies = longer paths, that seems to confirm my suspicion).

My system macOS Monterey 12.6.6 on Apple M1 running in native (aarch64) mode lmod installed via homebrew, version 8.7.29 lua installed via homebrew, version 5.4.6 shell: declare -x SHELL="/opt/homebrew/bin/bash

Describe the bug See above

To Reproduce One thing I tried was to start with a fresh shell (no modules loaded) and simply export the DYLD_LIBRARY_PATH from the broken shell, but that didn't trigger the problem.

Next, I exported both DYLD_LIBRARY_PATH and DYLD_FALLBACK_LIBRARY_PATH (about the same length), it triggered the error. I also changed the contents (paths) for these two variables to some invalid paths to test if the error is because it finds a rogue library/exectuable that causes the problem, but I still got the error. Therefore assuming that the combination of excessively long DYLD_LIBRARY_PATH and DYLD_FALLBACK_LIBRARY_PATH are the problem?

Expected behavior No errors.

Desktop (please complete the following information):

Modules based on Lua: Version 8.7.29 2023-07-10 17:58 -06:00 by Robert McLay mclay@tacc.utexas.edu

Changes from Default Configuration

Name Where Set Default Value


LFS_VERSION D 1.6.3 1.8.0 LMOD_PACKAGE_PATH D nil LMOD_PAGER C less /usr/bin/less LMOD_SITEPACKAGE_LOCATION Other /opt/homebrew/Cellar/lmod/8.7.29/libexec/SitePackage.lua LMOD_SYSTEM_DEFAULT_MODULES D unknown LMOD_TCLSH C tclsh /usr/bin/tclsh MODULEPATH_ROOT E /opt/homebrew/Cellar/lmod/8.7.29/modulefiles PATH_TO_LUA C lua /opt/homebrew/opt/lua/bin/lua SITE_CONTROLLED_PREFIX C no yes

Where Set -> D: default, E: environment, C: configuration lmod_cfg: lmod_config.lua SitePkg: SitePackage StdPkg: StandardPackage Other: Set somewhere outside of normal locations



**Additional context**
Add any other context about the problem here.
rtmclay commented 9 months ago

There is no internal limit for strings in Lmod. What shell are you using on the MAC?

You should try

$ $LMOD_CMD shell list > lmod.out
$ . ./lmod.out

and see what happens. Does the error occur with running Lmod or when sourcing ./lmod.out?

This seems likely that this is a shell problem and not an Lmod problem. But if you can show that your DYLD_* variables are somehow getting messed up inside of Lmod then it could be an Lmod problem.

climbfuji commented 9 months ago

Thanks for your quick reply.

  1. I can't generated from the broken shell lmod.out, because:

    > $LMOD_CMD shell list > lmod.out
    dyld[20041]: Assertion failed: (next < &roBuffer[0x10000]), function roalloc, file DyldProcessConfig.cpp, line 653.
  2. Of course, if I run this from a clean shell without the offending module loaded, there is no error whatsoever:

    > $LMOD_CMD shell list > lmod.out
    No modules loaded
    heinzell@JCSDA-L-18146:~ [brew-arch64]> cat lmod.out
    __LMOD_REF_COUNT_MODULEPATH=/opt/homebrew/Cellar/lmod/8.7.29/modulefiles/Darwin:1\;/opt/homebrew/Cellar/lmod/8.7.29/modulefiles/Core:1;
    export __LMOD_REF_COUNT_MODULEPATH;
    MODULEPATH=/opt/homebrew/Cellar/lmod/8.7.29/modulefiles/Darwin:/opt/homebrew/Cellar/lmod/8.7.29/modulefiles/Core;
    export MODULEPATH;
    _ModuleTable001_=X01vZHVsZVRhYmxlXyA9IHsKTVR2ZXJzaW9uID0gMywKY19yZWJ1aWxkVGltZSA9IGZhbHNlLApjX3Nob3J0VGltZSA9IGZhbHNlLApkZXB0aFQgPSB7fSwKZmFtaWx5ID0ge30sCm1UID0ge30sCm1wYXRoQSA9IHsKIi9vcHQvaG9tZWJyZXcvQ2VsbGFyL2xtb2QvOC43LjI5L21vZHVsZWZpbGVzL0RhcndpbiIsICIvb3B0L2hvbWVicmV3L0NlbGxhci9sbW9kLzguNy4yOS9tb2R1bGVmaWxlcy9Db3JlIiwKfSwKc3lzdGVtQmFzZU1QQVRIID0gIi9vcHQvaG9tZWJyZXcvQ2VsbGFyL2xtb2QvOC43LjI5L21vZHVsZWZpbGVzL0Rhcndpbjovb3B0L2hvbWVicmV3L0NlbGxhci9sbW9kLzguNy4yOS9tb2R1bGVmaWxlcy9Db3JlIiwKfQo=;
    export _ModuleTable001_;
    _ModuleTable_Sz_=1;
    export _ModuleTable_Sz_;
    heinzell@JCSDA-L-18146:~ [brew-arch64]> . ./lmod.out
    heinzell@JCSDA-L-18146:~ [brew-arch64]>

Is this what you wanted me to try?

rtmclay commented 9 months ago

If you want this looked at you will need to provide a test case that the Lmod team can run. Please use the bugReport template as a guide to provide a module or modules that reproduce your issue. Thanks!

climbfuji commented 9 months ago

So, on my system this was really easy to replicate:

# Start with a fresh shell (/opt/homebrew/bin/bash in my case)
> source /opt/homebrew/opt/lmod/init/profile
> module li
No modules loaded
> # Set the two env variables as in the attached file `export_dyld_paths.tar.gz`
> module li
dyld[16840]: Assertion failed: (next < &roBuffer[0x10000]), function roalloc, file DyldProcessConfig.cpp, line 653.

export_dyld_paths.tar.gz

Is that good enough for a start?

rtmclay commented 9 months ago

I am unable to reproduce your issue on my Mac. It works correctly under linux and MacOS. I am running Ventura 13.6.2

I am running Lmod 8.7.34 and I tested it with both zsh and bash.

climbfuji commented 9 months ago

I am unable to reproduce your issue on my Mac. It works correctly under linux and MacOS. I am running Ventura 13.6.2

I am running Lmod 8.7.34 and I tested it with both zsh and bash.

Thanks for testing! I am going to try this on a few other macs, too (AWS is your best friend). The system reported above has a slightly older lmod and also an older OS.

Are you using the native bash shell, or the homebrew bash?

rtmclay commented 9 months ago

homebrew bash

climbfuji commented 9 months ago

Sorry for the late reply. I couldn't replicate this error on my other systems - closing. Sorry for the false alert, and thanks for your help.

rtmclay commented 8 months ago

Great! closing this issue