gdevenyi / NeuroAnsible

An ansible playbook to deploy a complete standalone neuroimaging workstation
GNU General Public License v3.0
20 stars 0 forks source link

Some playbook failures result in link-modules.sh never being called, and no module files are generated #30

Closed egarza closed 1 week ago

egarza commented 1 week ago

Finished instalation with no fails, Ubuntu 24.04 WSL 2

It seemed to have install all tags, but there are almost no modules.

egarza@CAMUS:~/NeuroAnsible$ module avail

----------------------------- /opt/quarantine/software/nvhpc/24.3/install/modulefiles -----------------------------
   nvhpc-byo-compiler/24.3    nvhpc-hpcx-cuda12/24.3    nvhpc-nompi/24.3       nvhpc/24.3
   nvhpc-hpcx-cuda11/24.3     nvhpc-hpcx/24.3           nvhpc-openmpi3/24.3

---------------------------------------- /usr/share/lmod/lmod/modulefiles -----------------------------------------
   Core/lmod (L)    Core/settarg (D)

  Where:
   L:  Module is loaded
   D:  Default Module

If the avail list is too long consider trying:

"module --default avail" or "ml -d av" to just list the default modules.
"module overview" or "ml ov" to display the number of modules for each name.

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
gdevenyi commented 1 week ago

WorksForMe, 4 successfully installs in 22.04 and 24.04 so far.

https://github.com/gdevenyi/NeuroAnsible/blob/master/roles/science/tasks/modules.yml

  1. Did you close and open your terminal?
  2. Is $QUARANTINE_PATH defined?
  3. What is the content of /etc/profile.d
egarza commented 1 week ago
  1. Yes
  2. /opt/quarantine
  3. egarza@CAMUS:~$ ls /etc/profile.d
    01-locale-fix.sh  Z99-cloud-locale-test.sh   apps-bin-path.sh    gawk.csh  lmod.sh         zznvidia_modules.sh
    Z97-byobu.sh      Z99-cloudinit-warnings.sh  bash_completion.sh  gawk.sh   update-motd.sh  zzquarantine_path.sh
gdevenyi commented 1 week ago

find /opt/quarantine/modulefiles

egarza commented 1 week ago

Here is the content:

egarza@CAMUS:~$ ls  /opt/quarantine/modulefiles/
link-modules.sh
egarza@CAMUS:~$ cat /opt/quarantine/modulefiles/link-modules.sh
#!/bin/bash

set -euo pipefail

for file in ../software/*/*/module; do
        #echo ${file}
        mkdir -p $(basename $(dirname $(dirname ${file})))
        ln -f --relative -s ${file} $(basename $(dirname $(dirname ${file})))/$(basename $(dirname ${file})).lua
done
gdevenyi commented 1 week ago

Ok, so this was what I suspected, you did not have a complete successful run of the pipeline from beginning to end (this is indeed due to unstable network issues on your end, I've confirmed the error you're getting is something on your side). You got into a place where the link-modules.sh handler never ran because the playbook errored out, and then it didn't get triggered to run again.

I will have to add explicit linking and drop the handler, I was hoping to avoid this some that the playbook is truly idempotent as ansible playbooks are supposed to be.

Your fix is.

cd /opt/quarantine/modulefiles && sudo bash link-modules.sh

egarza commented 1 week ago

now its working, thanks