Open acblbtpccc opened 3 months ago
Hi, I found that this problem is caused by the artifact image is not accessible now, which is needed by the k8s-kata-manager
May I ask any one have the rights to fix this?
@zvonkok Hi, I am a colleague of @acblbtpccc , we are trying to reproduce the steps of the documentation provided by nVidia directly here:
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-kata.html
Sorry for the bump on an old issue, I think we could have done better introducing ourselves ^^;
Would you have a few minutes to spare to give us some pointers on what we obviously did wrong on this?
@zvonkok your help would be greatly appreciated, thank you so much in advance!
@zvonkok Hi Zvonkok,
I hope this message finds you well. I wanted to bring to your attention that I've opened a related issue https://github.com/kata-containers/kata-containers/issues/10360 when attempting to run directly from Kata Containers with GPU passthrough. I would greatly appreciate if you could take a look at this issue when you have a moment. I'm looking forward to your insights and thank you in advance for your time and expertise.
Additionally, I watched your interview videos on Youtube, which were very informative. If possible, would you be willing to share the environment configuration you used? This would be incredibly helpful for us to reference when trying to reproduce the setup.
Thank you again for your consideration and assistance.
@goutnet
@cdesiniotis
Hi Christopher, I noticed your comments in this issue. Are these artifacts still not open now? Does this mean we are still unable to reproduce the results in the official docs?
We are looking forward to your insights regarding some challenges we've encountered while using GPU-Operator with Kata. Your expertise would be greatly appreciated.
Thank you in advance for your time and assistance.
/cc @goutnet
OS: Ubuntu 20.04 CPU: AMD EPYC 9354 GPU: NVIDIA RTX A6000 * 8
I have already labeled the node, (master and worker on same machine)
If I use kata-qemu-nvidia-gpu(which is included in the docs for 24.3.0), the pod cannot start
If I use kata-nvidia-gpu(which is not in the docs for 24.3.0) runtimeclass, the output is as follows:
After compare the helm manifest, I guess that the difference may due to the kata-manager version.
The helm commands used is
The results above seems indicate that the docs is for kata-manager v0.1.0 rather than kata-manager v0.2.0, may I ask is there any documents for kata-manager v0.2.0? Or can I downgrade to kata-manager v0.1.0?