flux-framework / flux-k8s

Project to manage Flux tasks needed to standardize kubernetes HPC scheduling interfaces
Apache License 2.0
20 stars 10 forks source link

bug: support for affinity rules #58

Open vsoch opened 6 months ago

vsoch commented 6 months ago

When we parse the pod, it looks like we don't take into account affinity rules (e.g., for the Flux Operator here). Regardless of the CPU limit/requests, it could be that a pod has affinity that would ask for the entire node. In this case, we would ignore that and still pass in the cpu/memory via the jobspec here and fluxion could decide to put two pods on one node (if I understand that correctly). I think affinity rules are typically applied in Filter which is the step after PreFilter), and we implement it here but don't account for them. In this case we might ignore the affinity rule all together, so that could result in multiple pods/node for the MiniCluster unless the resource limits are also set.

For context, I'm trying to brainstorm the behavior I'm seeing with the latest experiments. It's most likely I did something wrong, but I think there are features of the Flux Operator that need to be taken into account (such as this one). If the default scheduler is accounting for affinity, that is minimally a subtle difference (even if not the exact problem here). I think likely what is needed is careful debugging of an entire scheduling session and checking of every output. I'll continue to try to think of more subtle differences and open issues as I do.