flux-framework / flux-k8s

Project to manage Flux tasks needed to standardize kubernetes HPC scheduling interfaces
Apache License 2.0
20 stars 10 forks source link

Design Problems for Fluence #68

Open vsoch opened 5 months ago

vsoch commented 5 months ago

I think I've been working on this over 30 hours this weekend and want to write down some concerns I have about #61, which is still not fully working with the new "bulk submit" model.

On a high level, we are trying to implement a model that has state into a framework that is largely against that. We are also trying to enforce the idea of a group of pods in a model where the unit is a single pod. For all of the above, I think our model works OK for small, more controlled cases, but we run into trouble for submission en-masse (as I'm trying to do). My head is spinning a bit from all these design problems and probably I need to step away for a bit. Another set / sets of eyes would help too.