flux-framework / flux-k8s

Project to manage Flux tasks needed to standardize kubernetes HPC scheduling interfaces
Apache License 2.0
21 stars 10 forks source link

Flux-Kube CLI and daemon #14

Closed milroy closed 2 years ago

milroy commented 3 years ago

This PR adds all the functionality needed for the Unified Interface project. It features three classes of functionality.

First, the functionality of mapping a Flux jobspec to an OpenShift template. The library takes a Flux jobspec and matches the jobspec command to a corresponding OpenShift template name. It then matches the template parameter overrides specified in the jobspec's system attributes to the overrides available in the selected template. The library returns a service, which is the selected template with the dictionary of parameter overrides.

Second, this PR adds the flux-kube.so jobtap plugin and the associated machinery to build and install it. The jobtap plugin is based on alloc-bypass.so (flux-core 90461710f1f09ed48cc7586abff5cb4fee6dc63d). Upon ingest, the plugin looks for attributes.system.flux-kube set in the jobspec. If it's present, the plugin sets the "alloc-bypass" flag, passes the jobspec to flux-kube.py via RPC, and unpacks the response. Then the plugin generates a dummy R (since the resources will reside on OpenShift) and commits it to the KVS. Finally, the plugin emits an "alloc" event and then submits the jobspec via asynchronous RPC to the flux kube daemon for creation of OpenShift objects. Cancellation works analogously, sending the stored jobspec to the flux kube daemon via an asynchronous RPC.

Third, this PR adds the fully-featured flux kube CLI and daemon. The CLI features translate, submit, cancel, and daemonize subcommands. Translate uses kube_translate.py to translate a Flux jobspec into an OpenShift template with parameter overrides read from system.attributes. Submit takes either a jobspec or string input and creates the corresponding OpenShift objects. Unfortunately, the OpenShift APIs do not have the required functionality (present in the oc binary) to convert templates to object YAMLs. Therefore, we must call oc process and pipe it to oc create via Python subprocess calls. Cancel works similarly to submit, but deletes the OpenShift objects.

Daemonize registers a Flux service that creates a watcher callback for jobspecs sent via RPC from the flux-kube.so jobtap. It creates the OpenShift objects and returns a success/failure value to the jobtap. The daemon is started as a job at rc1 via flux mini submit so that it can be cleaned up when Flux exits.

Much of the functionality of this PR requires OpenShift templates. The functionality will be extended to interface with base K8s CRDs (which can replicate the template functionality) in the future.

milroy commented 3 years ago

I added a getter class for the templates since I realized it's pretty useful to be able to print the available templates and their parameter overrides. I also restructured the class inheritance for simplicity and initialization performance.

milroy commented 3 years ago

The latest force pushes introduce functionality that implements oc get for OpenShift object types.

They also include mini submit and mini cancel which behave like flux mini submit in that they generate a skeleton jobspec with the specified OpenShift template and parameter overrides.

milroy commented 2 years ago

Note that I can add a new commit to change the file name to flux-oc.py, which will make the CLI invocation flux oc per the discussion several weeks ago with @dongahn.

milroy commented 2 years ago

Thank you for the thorough and helpful reviews @SteVwonder, @dongahn, and @grondo! I think I've addressed all but two issues so you can give this PR a final pass.

Regarding the other two issues, perhaps we can discuss during coffee time?

milroy commented 2 years ago

Thank you to everyone for your thorough and helpful reviews! This contribution is much more fully-featured and hardened than it would have been otherwise.