NCAR / spack-gust

Spack production user software stack on the Gust test system
4 stars 0 forks source link

Intermittent failure loading a module collection #4

Closed roryck closed 2 years ago

roryck commented 2 years ago

I have a module collection 'goes_gpu' consisting of modules: 1) crayenv/22.08 3) cuda/11.7 5) gcc 7) cray-dsmml 9) cray-mpich 11) PrgEnv-gnu/8.3.3 2) ncarenv/22.08 4) cudnn/8.2.4.15-11.4 6) craype 8) craype-network-ofi 10) cray-libsci

Sometimes module gd goes_gpu will load correctly, but sometimes (maybe always on compute nodes?) the command will fail with:

Lmod has detected the following error:  /opt/cray/pe/modulefiles/PrgEnv-gnu/8.3.3: (PrgEnv-gnu/8.3.3): can't read "env(LOADEDMODULES)": no such
variable
While processing the following module(s):
    Module fullname   Module Filename
    ---------------   ---------------
    PrgEnv-gnu/8.3.3  /opt/cray/pe/modulefiles/PrgEnv-gnu/8.3.3

and instead I have to separately load:

module load crayenv/22.08 ncarenv/22.08 cuda cudnn
module load PrgEnv-gnu
vanderwb commented 2 years ago

This module collection actually shouldn't work anyway, as the two "envs" should be mutually exclusive, and so I think what I've allowed you to load here should probably be broken. Let me sort that out, and then we can revisit what is still broken.

vanderwb commented 2 years ago

@roryck has this popped up again since the envs were made exclusive?

roryck commented 2 years ago

No, I haven't encountered this again, happy to assume it's fixed for now

roryck commented 2 years ago

Fixed