BiBiServ / bibigrid

BiBiGrid is a tool for an easy cluster setup inside a cloud environment.
Apache License 2.0
11 stars 8 forks source link

Fix errors after upgrade to Slurm 23.11.0 #470

Closed XaverStiensmeier closed 5 months ago

XaverStiensmeier commented 5 months ago
slurmctld: error: The CommunicationParameters option "NoAddrCache" is defunct, please remove it from slurm.conf.
slurmctld: error: If using PrologFlags=Contain for pam_slurm_adopt, either proctrack/cgroup or proctrack/cray_aries is required.  If not using pam_slurm_adopt,>

error: The CommunicationParameters option "NoAddrCache" is defunct, please remove it from slurm.conf.
error: If using PrologFlags=Contain for pam_slurm_adopt, either proctrack/cgroup or proctrack/cray_aries is required.  If not using pam_slurm_adopt, please ign>
slurmctld: error: Configured MailProg is invalid
XaverStiensmeier commented 5 months ago

The real error happened on the vpngtw and probably workers:

slurmctld: fatal: auth/jwt: cannot stat '/etc/slurm/jwt-secret.key': No such file or directory

It might be that the vpngtw needs to be treated separately as we do not schedule on him anyway. But the old slurm version doesn't throw this error at all. Instead just comments on the vpngtw not having a nodename described in the conf which is correct and should be fixed.

EDIT: Apparently the vpngtw not being described properly in the slurm.conf was promoted to a fatal error in the new slurm version. Just excluding it properly was enough to fix this issue.