cnvrg / metagpu

K8s device plugin for GPU sharing
https://cnvrg.io
MIT License
96 stars 9 forks source link

Helm chart redesign and mgctl injection via hostpath mounting #4

Open manfuin opened 1 year ago

manfuin commented 1 year ago

This is complete redesign of the Helm Chart for metagpu device plugin deployment.

Compared to the almost static manifests in the current version, this one is fully using Helm templating in a convenient Helm way of doing it. And of cause now it is possible to install any number of Helm Chart deployments if needed (names are parametrized as well instead of hardcode).

Flexibility

Suggested Helm chart has rich values.yaml to provide plugin configuration flexibility from the single file. I have added comments to the values according to my understanding of their meaning :) It might be worth to read through comments carefully before merging.

Also, as this is a result of my attempt to get it functional on our setup, this includes changes for some small extra functionality. Small additions on the Helm Chart level, like extraEnv is self-explanatory in values.yaml.

mgctl injection via hostpath mounting

The feature worth to highlight separately is mgctl injection via hostpath mounting. It might deserve dedicated PR, but as it was bounded to Helm Chart side as well I am to lazy to split it at this point. I hope you can find it useful and we merge in one batch to avoid spending time on the formal split.

Motivation: cp/chmod injection is not always possible as /usr/bin is not always writable, tools are not always in conrtainer, etc

Device plugin allows to mount hostPath to container (e.g. nvidia-smi is mounted that way or cuda libraries). The workflow for metagpu:

  1. On DaemonSet Pod startup copy mgctl to the host directory
  2. Use plugin API to inject the mgctl mount from host directory

Advantages:

Anyway it is conditional in both helm chart and mgdp, so end-user can chose the preferred way.