aws-samples / 1click-hpc

Deploy your HPC Cluster on AWS in 20min. with just 1-Click.
MIT No Attribution
62 stars 44 forks source link

Prolog script edits to reduce scaling issues reported in #34 #37

Closed Soham-G closed 1 year ago

Soham-G commented 1 year ago

Issue #, if available: #34

Description of changes: Prolog script is run once on every compute node after it is initialized, so removed for loops and removed need to check if compute nodes up first. Also allowed reduced calls to scontrol (which is discouraged in slurm docs), but still has one call to scontrol per compute node to retrieve job comments. Exploring spank plugins for a better solution in a future commit.

Also updates python version for lambda function causing deployment error as v3.6 no longer supported.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.