Open starlitsky2010 opened 1 year ago
Hey @starlitsky2010! Are Pyxis and enroot installed on all nodes in the cluster and at the same version as well? The Slurm version is a bit older than what we've tested previously so it's possible that would benefit from an update if practical. The oldest version we've documented with NeMo Framework on Slurm that I'm aware of was the following:
So your Pyxis version should be fine, but Slurm could potentially be updated, though I can't say that's definitively the problem at the moment.
Was Pyxis/enroot installed recently? Have the Slurm daemons been restarted?
Hi @roclark ,
Pyxis and enroot installed on all nodes. It should be the slurm version too old (19.05.5), it's not compatible with the latest version pyxis.
I've tested v0.7.0. When I srun --help. the container relative options will be shown. For Ubuntu 20.04, it will install slurm-wlm 19.05.5 automatically by command below: sudo apt install slurmd slurmctld -y
Do you have any Ubuntu version recommended? How did you install the slurm? Could you help provide some links about it?
I'll try the following version later. Slurm: 20.11.7 Pyxis: 0.9.1
Thanks Aaron
Environment:
Pyxis v0.14.0 Slrum19.05.5 enroot: enroot+caps_3.4.1
Get Method:
Error Info:
Thanks Aaron