fgci-org / ansible-role-slurm

For installing and configuring SLURM - Simple Linux Utility for Resource Management
MIT License
35 stars 15 forks source link

Set ConstrainKmemSpace=no by default #98

Closed jabl closed 6 years ago

jabl commented 6 years ago

Setting ConstrainKmemSpace=no works around a bug in the Linux kernel that manifests itself as slurm failing to create a memory cgroup, with an error message like

slurmstepd: error: task/cgroup: unable to add task[pid=1234] to memory cg '(null)'

See slurm bug 5082.

martbhell commented 6 years ago

Restarted travis build after the previous build changes to see if the builds pass for all slurm versions.

jabl commented 6 years ago

Ugh, maybe we should make it optional then..

OTOH, do we care? SchedMD supports only the two latest releases anyway, so it's probably a bad idea to use older ones anyway. Currently that would be 18.08 and 17.11: https://groups.google.com/forum/#!topic/slurm-users/kusJTy4T4x8

martbhell commented 6 years ago

Either way is fine with me. Personally I'd like to keep backwards compatibility if it's not too much work. This role does not support 1811 yet I guess.

jabl commented 6 years ago

Personally I think backwards compatibility is fine too, but in this case there's the choice between dropping some old releases and having to explicitly set an option (meaning, most sites will never do it) that ought to always be enabled, at least until the underlying kernel bug is fixed.

martbhell commented 6 years ago

To not break backwards compatibility my idea once was that one could add the setting to the appropriate vars/ files and then it should end up in everybody's slurm.conf file, or is that not working?

jabl commented 6 years ago

Something like this? (the test on ansible-master fails, though for some apparently unrelated issue)