Azure / cyclecloud-slurm

Azure CycleCloud project to enable users to create, configure, and use Slurm HPC clusters.
MIT License
56 stars 42 forks source link

Previous and new users being stuck in pending status #274

Open thomasj20git opened 1 month ago

thomasj20git commented 1 month ago

I ran into an issue with my cluster where my users were all created and I could login into each of the nodes with my users but now I cant and they are stuck in this pending to be created state. Screenshot 2024-07-16 at 7 48 19 AM

thomasj20git commented 1 month ago

Active Failed to execute cluster-init script '/mnt/cluster-init/slurm/login/scripts/00-install-login.sh' in project 'slurm' (return code: 1) Software Configuration Edit and re-upload the script to correct the error and try again Get more help on this issue Detail: Script output: An error occured during installation. See log file /var/log/azure-slurm-install.log for details. 2024-07-15 16:33:39,877 ERROR: An error occured during installation. Traceback (most recent call last): File "install.py", line 631, in main() File "install.py", line 604, in main munge_key(settings) File "install.py", line 207, in munge_key ilib.copy_file( File "/opt/cycle/jetpack/system/bootstrap/azure-slurm-install/installlib.py", line 148, in copy_file shutil.copyfile(src=source, dst=dest) File "/opt/cycle/jetpack/system/embedded/lib/python3.8/shutil.py", line 264, in copyfile with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst: FileNotFoundError: [Errno 2] No such file or directory: '/sched/clusterpoc/munge.key'

aditigaur4 commented 1 month ago

Did you restart the cluster when you saw the issue in cluster-init? which version of CC is this?