aws-samples / aws-eda-slurm-cluster

AWS Slurm Cluster for EDA Workloads
MIT No Attribution
24 stars 7 forks source link

[BUG] Modulefile not created #167

Closed cartalla closed 7 months ago

cartalla commented 9 months ago

Describe the bug After configuring the virtual desktop as a submitter new shells have an error that the module file doesn't exist:

''' ModuleCmd_Use.c(231):ERROR:64: Directory '/opt/slurm/edapc-3-7-1-c7-x86-1/config/modules/modulefiles/CentOS/7/x86_64' not found '''

Expected behavior Modulefile exists after cluster created.

Repository Version v2.0.1

cartalla commented 9 months ago

The playbook failed before the modulefile was created.

on_head_node_configured.sh: TASK [ParallelClusterHeadNode : Run /opt/slurm/config/bin/create_slurm_accounts.py to make sure it works] **********************************************************************************************************************************************************************************************************************************************************************************
on_head_node_configured.sh: fatal: [local]: FAILED! => changed=true 
on_head_node_configured.sh:   cmd: |-
on_head_node_configured.sh:     set -ex
on_head_node_configured.sh:   
on_head_node_configured.sh:     export SLURM_ROOT=/opt/slurm
on_head_node_configured.sh:     /opt/slurm/config/bin/create_slurm_accounts.py --accounts /opt/slurm/config/accounts.yml --users /opt/slurm/config/users_groups.json --default-account unassigned
on_head_node_configured.sh:   delta: '0:00:11.450660'
on_head_node_configured.sh:   end: '2023-10-05 09:12:11.548713'
on_head_node_configured.sh:   msg: non-zero return code
on_head_node_configured.sh:   rc: 1
on_head_node_configured.sh:   start: '2023-10-05 09:12:00.098053'
on_head_node_configured.sh:   stderr: |-
on_head_node_configured.sh:     + export SLURM_ROOT=/opt/slurm
on_head_node_configured.sh:     + SLURM_ROOT=/opt/slurm
on_head_node_configured.sh:     + /opt/slurm/config/bin/create_slurm_accounts.py --accounts /opt/slurm/config/accounts.yml --users /opt/slurm/config/users_groups.json --default-account unassigned
on_head_node_configured.sh:     ERROR:root:Unhandled exception in /opt/slurm/config/bin/create_slurm_accounts.py
on_head_node_configured.sh:     Traceback (most recent call last):
on_head_node_configured.sh:       File "/opt/slurm/config/bin/create_slurm_accounts.py", line 288, in <module>
on_head_node_configured.sh:         app = SlurmAccountManager(args.accounts, args.users, args.default_account)
on_head_node_configured.sh:       File "/opt/slurm/config/bin/create_slurm_accounts.py", line 79, in __init__
on_head_node_configured.sh:         number_of_changes = self.update_slurm()
on_head_node_configured.sh:       File "/opt/slurm/config/bin/create_slurm_accounts.py", line 214, in update_slurm
on_head_node_configured.sh:         raise RuntimeError("Some slurm updates failed")
on_head_node_configured.sh:     RuntimeError: Some slurm updates failed
on_head_node_configured.sh:     Traceback (most recent call last):
on_head_node_configured.sh:       File "/opt/slurm/config/bin/create_slurm_accounts.py", line 288, in <module>
on_head_node_configured.sh:         app = SlurmAccountManager(args.accounts, args.users, args.default_account)
on_head_node_configured.sh:       File "/opt/slurm/config/bin/create_slurm_accounts.py", line 79, in __init__
on_head_node_configured.sh:         number_of_changes = self.update_slurm()
on_head_node_configured.sh:       File "/opt/slurm/config/bin/create_slurm_accounts.py", line 214, in update_slurm
on_head_node_configured.sh:         raise RuntimeError("Some slurm updates failed")
on_head_node_configured.sh:     RuntimeError: Some slurm updates failed
on_head_node_configured.sh:   stderr_lines: <omitted>
on_head_node_configured.sh:   stdout: ''
on_head_node_configured.sh:   stdout_lines: <omitted>
on_head_node_configured.sh: