Closed dorrellmw closed 2 years ago
The answer to your first group of questions. The empty .modulerc.lua causes Lmod to treat the two hidden modules as Name-version-version files. You can read about that in the lmod.readthedocs.io pages. But the point is that for all modules the short name is ABC for all 3 modulefiles and the version can be either 2.0.0 or .cpu/2.0.0 etc.
As far as I know there are no downsides to doing it this way except for the problem you have pointed out.
I will have to track down why the ref counts for the ABC/2.0.0 module still lives even though it has been removed.
By the way, you could change the name of the .cpu and .gpu modules to be something like: ABC_helper/{.cpu,.gpu} and have both ABC and ABC_helper loaded at the same time.
Well that was a subtle bug, but I found the problem. This bug only happens when two modulefiles share the same "sn". This is the shortname. In this case it is "ABC". I have created a new version of Lmod (8.5.12) which solves this problem for me.
When you get a chance, please test Lmod 8.5.12. Thanks very much for the bug report!
O.K. to close this issue?
I'm sorry for losing track of this issue, but thank you for resolving the bug!
Context: I'm deploying software ABC for a cluster which has "cpu nodes" and "gpu nodes", and ABC needs separate builds for the two kinds of nodes.
I want the users to be able to
module load ABC
(ormodule load ABC/2.0.0
) and automatically get the correct build for the system which is loading the module (yes, there are caveats to this, but that's not the point of this issue). I have found that this directory structure works:where
ABC/2.0.0.lua
has this content:and where
ABC/.cpu/2.0.0.lua
andABC/.gpu/2.0.0.lua
are the actual modules for the cpu and gpu builds of ABC. The remaining file,ABC/.modulerc.lua
, is empty.When you
module load ABC
and then runmodule list
, you get this:and
module load ABC/2.0.0
ormodule load ABC/.cpu/2.0.0
all yield the same result. If you now runmodule load ABC/.gpu/2.0.0
, you get this:and now
module load ABC
will switch them back. Everything works exactly as I'd like, I can tell users tomodule load ABC
and it will choose the right build transparently behind the scenes.Is this normal behavior? Why does
ABC/.modulerc.lua
have to exist for it to work? Are there any corner-cases where this fails?Thanks!
EDIT: I've noticed a side effect that may or may not be harmless. After unloading the module,
env | grep -i ABC
gives this result:However,
module unload ABC
,module unload ABC/2.0.0
, andmodule purge
have no effect. As far as I can tell, there is no way to remove those final traces of ABC/2.0.0 (aside from directly changing the environment variables).These tests were performed in version 8.2.7.