NVIDIA / k8s-kata-manager

Apache License 2.0
17 stars 6 forks source link

after helm install gpu-operator, no kata-qemu-nvidia-gpu runtimeclass, only kata-nvidia-gpu #59

Open acblbtpccc opened 3 months ago

acblbtpccc commented 3 months ago

OS: Ubuntu 20.04 CPU: AMD EPYC 9354 GPU: NVIDIA RTX A6000 * 8

image

I have already labeled the node, (master and worker on same machine)

image

If I use kata-qemu-nvidia-gpu(which is included in the docs for 24.3.0), the pod cannot start

image

If I use kata-nvidia-gpu(which is not in the docs for 24.3.0) runtimeclass, the output is as follows:

image image image

After compare the helm manifest, I guess that the difference may due to the kata-manager version.

image image

The helm commands used is

image

The results above seems indicate that the docs is for kata-manager v0.1.0 rather than kata-manager v0.2.0, may I ask is there any documents for kata-manager v0.2.0? Or can I downgrade to kata-manager v0.1.0?

acblbtpccc commented 3 months ago

Hi, I found that this problem is caused by the artifact image is not accessible now, which is needed by the k8s-kata-manager

image

image

May I ask any one have the rights to fix this?

goutnet commented 1 month ago

@zvonkok Hi, I am a colleague of @acblbtpccc , we are trying to reproduce the steps of the documentation provided by nVidia directly here:

https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-kata.html

Sorry for the bump on an old issue, I think we could have done better introducing ourselves ^^;

Would you have a few minutes to spare to give us some pointers on what we obviously did wrong on this?

@zvonkok your help would be greatly appreciated, thank you so much in advance!

acblbtpccc commented 1 month ago

@zvonkok Hi Zvonkok,

I hope this message finds you well. I wanted to bring to your attention that I've opened a related issue https://github.com/kata-containers/kata-containers/issues/10360 when attempting to run directly from Kata Containers with GPU passthrough. I would greatly appreciate if you could take a look at this issue when you have a moment. I'm looking forward to your insights and thank you in advance for your time and expertise.

Additionally, I watched your interview videos on Youtube, which were very informative. If possible, would you be willing to share the environment configuration you used? This would be incredibly helpful for us to reference when trying to reproduce the setup.

Thank you again for your consideration and assistance.

@goutnet

acblbtpccc commented 1 month ago

@cdesiniotis

Hi Christopher, I noticed your comments in this issue. Are these artifacts still not open now? Does this mean we are still unable to reproduce the results in the official docs?

We are looking forward to your insights regarding some challenges we've encountered while using GPU-Operator with Kata. Your expertise would be greatly appreciated.

Thank you in advance for your time and assistance.

/cc @goutnet