ExaWorks / psij-python

MIT License
27 stars 13 forks source link

Fix slurm nodefiles #446

Closed hategan closed 8 months ago

hategan commented 8 months ago

scontrol show hostnames only shows one line per node, no matter how many processes we requested. In order to bring this in line with the uniform PBS way of doing things (one line per process in the nodefile, possibly with duplicate entries), we need to do some manual work.

This also further fixes environment variable processing. Variables were set before the #SLURM --export directives, which means that the latter were effectively ignored by Slurm which stop processing once anything non-comment is found in the script.

hategan commented 8 months ago

Duh! on the export fix, sorry for missing that! Thanks on the nodefile mangling.

For reference, I checked the templates for the other executors and they don't seem to have a similar issue, but that's likely because they deal with env vars somewhat differently.