NVIDIA / k8s-dra-driver

Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes
Apache License 2.0
251 stars 47 forks source link

Documentation prepared by the pre-environment #36

Open CoderTH opened 10 months ago

CoderTH commented 10 months ago

In readme, the quick start case is a cluster environment launched with one click of kind, in which all the environments are ready. I have tried to use it and it works well.

But I tried to manually build my own cluster to run the case, and there were all kinds of unexpected errors. I wonder if we have a chance to write a document to describe this.

The general steps for my installation are as follows:

  1. Install nvidia driver.
  2. Install the kubernetes cluster with version 1.28.
  3. Manually turn on the support of DRA. The modified components are kube-apiserver & kube-controller & kube-scheduler & kubelet.
  4. Install nvidia-controller-runtime & set enable_cdi = true.
  5. Manually build the dra image and install

But according to this process, I still encountered a lot of unexpected problems. I don't know if I did it wrong, so I hope to have a similar document to describe it to help a novice like me to use it normally.

parthyadav3105 commented 9 months ago

I can add this as readme, I was able make it working previously.

anencore94 commented 5 months ago

Is there any prerequisite for minimum k8s version for k8s-dra-driver ? ex. v1.27 / v1.28 / v1.29