Closed cartalla closed 9 months ago
Can see that on_head_node_updated.sh
ran by looking in /var/log/cfn-init-cmd-log.
I see that the following ansible play failed:
2023-09-27 10:43:15,899 P17476 [INFO] fatal: [local]: FAILED! => changed=true
2023-09-27 10:43:15,899 P17476 [INFO] cmd: |-
2023-09-27 10:43:15,899 P17476 [INFO] set -ex
2023-09-27 10:43:15,899 P17476 [INFO]
2023-09-27 10:43:15,899 P17476 [INFO] /opt/slurm/config/bin//create_users_groups.py -i /opt/slurm/config/users_groups.json
2023-09-27 10:43:15,899 P17476 [INFO] delta: '0:00:00.038857'
2023-09-27 10:43:15,899 P17476 [INFO] end: '2023-09-27 10:43:15.566711'
2023-09-27 10:43:15,899 P17476 [INFO] msg: non-zero return code
2023-09-27 10:43:15,899 P17476 [INFO] rc: 1
2023-09-27 10:43:15,899 P17476 [INFO] start: '2023-09-27 10:43:15.527854'
2023-09-27 10:43:15,899 P17476 [INFO] stderr: |-
2023-09-27 10:43:15,899 P17476 [INFO] + /opt/slurm/config/bin//create_users_groups.py -i /opt/slurm/config/users_groups.json
2023-09-27 10:43:15,899 P17476 [INFO] Traceback (most recent call last):
2023-09-27 10:43:15,899 P17476 [INFO] File "/opt/slurm/config/bin//create_users_groups.py", line 109, in <module>
2023-09-27 10:43:15,899 P17476 [INFO] main(args.filename)
2023-09-27 10:43:15,899 P17476 [INFO] File "/opt/slurm/config/bin//create_users_groups.py", line 49, in main
2023-09-27 10:43:15,899 P17476 [INFO] subprocess.check_output(['groupadd', '-g', gid, group_name], stderr=subprocess.STDOUT)
2023-09-27 10:43:15,900 P17476 [INFO] File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
2023-09-27 10:43:15,900 P17476 [INFO] **kwargs).stdout
2023-09-27 10:43:15,900 P17476 [INFO] File "/usr/lib64/python3.6/subprocess.py", line 423, in run
2023-09-27 10:43:15,900 P17476 [INFO] with Popen(*popenargs, **kwargs) as process:
2023-09-27 10:43:15,900 P17476 [INFO] File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
2023-09-27 10:43:15,900 P17476 [INFO] restore_signals, start_new_session)
2023-09-27 10:43:15,900 P17476 [INFO] File "/usr/lib64/python3.6/subprocess.py", line 1364, in _execute_child
2023-09-27 10:43:15,900 P17476 [INFO] raise child_exception_type(errno_num, err_msg, err_filename)
2023-09-27 10:43:15,900 P17476 [INFO] FileNotFoundError: [Errno 2] No such file or directory: 'groupadd': 'groupadd'
2023-09-27 10:43:15,900 P17476 [INFO] stderr_lines: <omitted>
2023-09-27 10:43:15,900 P17476 [INFO] stdout: ''
2023-09-27 10:43:15,900 P17476 [INFO] stdout_lines: <omitted>
2023-09-27 10:43:15,900 P17476 [INFO]
After resolving this error I got the following:
2023-09-27 11:36:11,276 P26183 [INFO] fatal: [local]: FAILED! => changed=true
2023-09-27 11:36:11,277 P26183 [INFO] cmd: ifconfig eth0 txqueuelen 4096
2023-09-27 11:36:11,277 P26183 [INFO] delta: '0:00:00.002916'
2023-09-27 11:36:11,277 P26183 [INFO] end: '2023-09-27 11:36:10.936460'
2023-09-27 11:36:11,277 P26183 [INFO] msg: non-zero return code
2023-09-27 11:36:11,277 P26183 [INFO] rc: 127
2023-09-27 11:36:11,277 P26183 [INFO] start: '2023-09-27 11:36:10.933544'
2023-09-27 11:36:11,277 P26183 [INFO] stderr: '/bin/sh: ifconfig: command not found'
2023-09-27 11:36:11,277 P26183 [INFO] stderr_lines: <omitted>
2023-09-27 11:36:11,277 P26183 [INFO] stdout: ''
2023-09-27 11:36:11,277 P26183 [INFO] stdout_lines: <omitted>
Describe the bug
After an update to change instance types, I get the following error from Slurm commands:
The path was supposed to be updated to /opt/slurm/ClusterName/etc so it work both on the controller and submitter instances.
Expected behavior Slurm commands work.