Closed danielvegamyhre closed 4 months ago
/gcbrun
LGTM, we could add a check to see if the JobSet is still active or not as well as existent - or wait to add in another PR
Makes sense, I have the jobset utils for checking status etc included as part of the follow up PR for the deletion controller update. I'll include that change in that one.
/gcbrun
/retest
/gcbrun
This PR includes the following changes:
Change node pool naming convention to
{first 34 chars of jobset name}-{first 5 chars of job-key}
ensuring it is stable through JobSet restarts and more easily searchable for debugging.Track name of JobSet whose pods triggered the creation of a given node pool. In the node pool garbage collection loop, use this information to determine if we should delete a node pool which is an ERROR state with no k8s node objects associated with it, whose JobSet no longer exists.
Upgrade to controller-runtime and other packages to latest versions. This was require to use import JobSet API package. The previous controller-runtime version was pretty old (v0.14.1 v latest v0.18.0) so there were some breaking changes I had to resolve.
Next steps for follow up PRs: