Closed vsoch closed 5 months ago
Failure due to controller entry point change from 5 days ago. Hopefully won’t require a Kubernetes component version update. https://github.com/kubernetes-sigs/scheduler-plugins/commit/4d3d41c5f994c9c94b6a21dae306785cbc2df833
I think when we merge https://github.com/flux-framework/fluxion-go/pull/8 that should update fluence go bindings to 1.21, and then we can attempt updating here. sig-scheduler plugins is at go 1.21 https://github.com/kubernetes-sigs/scheduler-plugins/blob/51d27b6e06b339bfa413d0415d80c86d01097b44/go.mod#L3.
I'm going to try building with that branch. If it works, I'll merge there and update here, and if this passes we can merge into the other PR branch. That's a lot of "ifs" :laughing:
Looks like the upstream is still a moving target! They bumped kubernetes now up to 1.29x. https://github.com/kubernetes-sigs/scheduler-plugins/commit/2d20310880323ae307312d4d4fdfa78c2267073c. It changed a few function signature, trying to figure that out now.
Problem: the submit of the first index works for more controlled lengths (e.g., lammps takes a while) but was having issues with really quick jobs. Solution: try restoring the queue that allows for enabling siblings pods so any group can be scheduled.