Open jhrmnn opened 5 months ago
We are definitely working on improving that prolog script in the next release. But i just wanted to clarify that it runs on every job launch and not node launch. Also if you are not reliant on the azslurm cost
feature or you dont use it then we suggest just commenting out the PrologSlurmctld
line in /etc/slurm/slurm.conf. That will just not run the script.
Thanks for confirming! Looking forward to the next release.
But i just wanted to clarify that it runs on every job launch and not node launch.
That makes sense, I'm running 1-node jobs, so the distinction wasn't clear to me.
CycleCloud version: 8.6.2-3276 Slurm version: 22.05.11
AFAIK, CycleCloud's prolog script calls
get_acct_info.sh
which callsazslurm accounting_info
and this happens for each launched node. I'm observing that each launch ofazslurm accounting_info
takes ~150MB of memory, so when launching hundreds of nodes simultaneously, the scheduler can easily get out of memory.Currently I'm mitigating by commenting out the call to
get_acct_info.sh
in the prolog script.