flux-framework / flux-k8s

Project to manage Flux tasks needed to standardize kubernetes HPC scheduling interfaces
Apache License 2.0
20 stars 10 forks source link

bug: sorting based on PodGroup timestamp with second granularity? #59

Closed vsoch closed 1 month ago

vsoch commented 6 months ago

I'm trying to understand the granularity that we get with using the metav1.Time, because (based on what I'm seeing) it seems like when I submit a huge batch of jobs with multiprocessing (likely in the same second) we get interleaving. I can't think of another reason that we'd get blocking, and consistently for both default and fluence, when the cluster size is close to the job size (or the ratio is about 1/2, so one large job could take up half resources). For example, I noticed this issue here https://github.com/tilt-dev/tilt/pull/4313 that mentions some APIs are using time.Time(), which (according to the issue) has second granularity. Their fix was to use time.MicroTime. Specifically:

Currently, metav1.Time is only stored with second-level granularity, which is probably not sufficient for this API.

And indeed the PodGroup is using metav1.Time as we can see is defined here which wraps here again. I think probably if we want to handle this "spamming the scheduler" case (and not screw up the sort) we also need to use https://github.com/kubernetes/apimachinery/blob/02a41040d88da08de6765573ae2b1a51f424e1ca/pkg/apis/meta/v1/micro_time.go#L31. This also means the PodGroup abstraction is going to have that bug, and (I think) it wasn't an issue before with launching just 3-5 jobs. What I probably should do is create a new branch off of my current development one, and restore some of the cache logic that I was working on with an internal PodGroup, and test a very simple (stupid) approach to create a MicroTime at the first time that I see a group go through sort. If that resolves the interleaving, we can be more confident it's related to time. I ran out of extra credits today but should be able to test this locally with kind (I was seeing interleaving there, why I abandoned the experimental design in the first place!)

vsoch commented 3 months ago

This will be closed with #69 that uses millisecond granularity.