Federation is currently supported, but it doesn't behave the way I originally expected.
When jobs are submitted to a federation my expectation was that they would be scheduled on the origin cluster if resources were available, but that isn't the way that it works.
The jobs are submitted to all of the federated clusters and all the clusters try to schedule the job.
Locks are used to prevent multiple clusters from scheduling the same job, but the winning cluster is indeterminate.
The desired behavior is to be able to prioritize AZs and regions so that jobs are only scheduled in another AZ or region if the higher priority AZs do not have resources.
For example, one "AZ" may be your on-premises cluster and your other AZs are on AWS.
You would want to use your on-premises compute nodes and only use AWS instances if on-premises resources are not available.
Another example is to have a primary AZ where you want jobs to run and other AZs for overflow when the primary AZ doesn't have capacity.
Per SchedMD, the solution is to create a single Slurm cluster that has compute nodes in multiple AZs.
The AWS slurm plugin launches instances base on the name of the compute node which encodes the attributes of the compute node such as OS, instance type, and spot.
This would have to be extended to also include the region and AZ so that when Slurm powers up a node the plugin
knows the region and AZ where the new instance should be launched.
The main concern that I have with this extension is the length of the nodename/hostname and whether adding the additional characters will exceed a limit.
Federation is currently supported, but it doesn't behave the way I originally expected. When jobs are submitted to a federation my expectation was that they would be scheduled on the origin cluster if resources were available, but that isn't the way that it works. The jobs are submitted to all of the federated clusters and all the clusters try to schedule the job. Locks are used to prevent multiple clusters from scheduling the same job, but the winning cluster is indeterminate.
The desired behavior is to be able to prioritize AZs and regions so that jobs are only scheduled in another AZ or region if the higher priority AZs do not have resources. For example, one "AZ" may be your on-premises cluster and your other AZs are on AWS. You would want to use your on-premises compute nodes and only use AWS instances if on-premises resources are not available. Another example is to have a primary AZ where you want jobs to run and other AZs for overflow when the primary AZ doesn't have capacity.
Per SchedMD, the solution is to create a single Slurm cluster that has compute nodes in multiple AZs. The AWS slurm plugin launches instances base on the name of the compute node which encodes the attributes of the compute node such as OS, instance type, and spot. This would have to be extended to also include the region and AZ so that when Slurm powers up a node the plugin knows the region and AZ where the new instance should be launched. The main concern that I have with this extension is the length of the nodename/hostname and whether adding the additional characters will exceed a limit.