Closed zacharygraber closed 4 months ago
Diff looks good, and I believe you that it works!
@c-mart Appreciate the review/approval! Just to be clear: I don't have permission to merge into this repo.
Should I be worried about this in the local_create.log
file @zacharygraber?
TASK [add local users to compute node] *****************************************
changed: [nicely-still-whippet-compute-base-instance] => {"changed": true, "rc": 0, "stderr": "Shared connection to nicely-still-whippet-compute-base-instance closed.\r\n", "stderr_lines": ["Shared connection to nicely-still-whippet-compute-base-instance closed."], "stdout": "Lmod has detected the\r\nfollowing error: The\r\nfollowing module(s) are\r\nunknown: \"xalt\"\r\n\r\nPlease check the spelling or\r\nversion number. Also try\r\n\"module spider ...\"\r\nIt is also possible your\r\ncache file is out-of-date; it\r\nmay help to try:\r\n $ module --ignore_cache\r\nload \"xalt\"\r\n\r\nAlso make sure that all\r\nmodulefiles written in TCL\r\nstart with the string\r\n#%Module\r\n\r\n\r\n\r\n", "stdout_lines": ["Lmod has
detected the", "following error: The", "following module(s) are", "unknown: \"xalt\"", "", "Please check the spelling or", "version number. Also try", "\"module spider ...\"", "It is also possible your", "cache file is out-of-date; it", "may help to try:", " $ module --ignore_cache", "load
\"xalt\"", "", "Also make sure that all", "modulefiles written in TCL", "start with the string", "#%Module", "", "", ""]}
Update: Slurm test job still works... Hopefully it doesn't break something subtle.
I don't have write access to this repo, so I can't merge it either.
@DImuthuUpe? @c-mart?
Thanks @julianpistorius. I made you an admin
Should I be worried about this in the
local_create.log
file @zacharygraber?TASK [add local users to compute node] ***************************************** changed: [nicely-still-whippet-compute-base-instance] => {"changed": true, "rc": 0, "stderr": "Shared connection to nicely-still-whippet-compute-base-instance closed.\r\n", "stderr_lines": ["Shared connection to nicely-still-whippet-compute-base-instance closed."], "stdout": "Lmod has detected the\r\nfollowing error: The\r\nfollowing module(s) are\r\nunknown: \"xalt\"\r\n\r\nPlease check the spelling or\r\nversion number. Also try\r\n\"module spider ...\"\r\nIt is also possible your\r\ncache file is out-of-date; it\r\nmay help to try:\r\n $ module --ignore_cache\r\nload \"xalt\"\r\n\r\nAlso make sure that all\r\nmodulefiles written in TCL\r\nstart with the string\r\n#%Module\r\n\r\n\r\n\r\n", "stdout_lines": ["Lmod has detected the", "following error: The", "following module(s) are", "unknown: \"xalt\"", "", "Please check the spelling or", "version number. Also try", "\"module spider ...\"", "It is also possible your", "cache file is out-of-date; it", "may help to try:", " $ module --ignore_cache", "load \"xalt\"", "", "Also make sure that all", "modulefiles written in TCL", "start with the string", "#%Module", "", "", ""]}
Update: Slurm test job still works... Hopefully it doesn't break something subtle.
@julianpistorius Xalt is the tracking software we installed to get usage stats for the software share. These scripts install a bunch of stuff like openhpc, which I believe overrides our modulepath at /software
(where Xalt is located), meaning that Lmod can't find the Xalt module (it tries to load it on login by default, since it only tracks if the module is loaded).
It's nothing really to worry about.
@zacharygraber: It's nothing really to worry about.
Not sure if it's related, and I didn't get it while testing it using your instructions, but there seems to be a module-related problem: #18
At present, the
rocky-linux
branch fails setup on Jetstream2 (e.g. through Exosphere) due to relying on old versions of the OpenStack API. This PR introduces a couple of changes that allow it to function again:openstacksdk
andpython-openstackclient
up to the latest versionsaccess_ipv4
returns null).${cluster-name}-compute-...
).To Test
Enable experimental features in Exosphere
Create a new Jetstream2 instance from the Featured-RockyLinux8 image (through Exosphere)
Set
Create your own SLURM cluster with this instance as the head node
toYes
In the Boot Script, replace
{create-cluster-command}
with:Once setup finishes, verify that you can run Slurm jobs: