Closed asm582 closed 9 months ago
The resource class is wrong: gpu.example.com
it should be gpu.nvidia.com
The resource class is wrong:
gpu.example.com
it should begpu.nvidia.com
Thanks @klueska as seen I am running an example driver with simulated GPUs. are you saying immediate mode only works with real GPUs?
The resource class is wrong:
gpu.example.com
it should begpu.nvidia.com
Thanks @klueska as seen I am running an example driver with simulated GPUs. are you saying immediate mode only works with real GPUs?
Are you refering to the https://github.com/kubernetes-sigs/dra-example-driver? If so, we should migrate this issue there instead. This repository is for the NVIDIA GPU-specific DRA driver implementation.
Support is not yet merged for it in the example driver. See https://github.com/kubernetes-sigs/dra-example-driver/pull/4
In any case, I got confused because (as Evan said) you opened the issue against this repo, rather than the example driver repo (so i assumed you were using the NVIDIA DRA driver rather than the example one).
Sorry for the confusion, the reason I raised the issue here is that I saw this logline:
If we think immediate mode works I can certainly move the issue to the desired repository, thanks
Hello, we tried this on real nodes and got the below status when exercising claims in Immediate mode :
[root@nvd-srv-02 k8s-dra-driver]# kubectl describe resourceclaim gpu.nvidia.com -n gpu-test1
Name: gpu.nvidia.com
Namespace: gpu-test1
Labels: <none>
Annotations: <none>
API Version: resource.k8s.io/v1alpha2
Kind: ResourceClaim
Metadata:
Creation Timestamp: 2023-11-29T17:54:02Z
Finalizers:
gpu.resource.nvidia.com/deletion-protection
Resource Version: 7898
UID: 066b4c8f-a174-45eb-a1b7-9b4ad78a0f17
Spec:
Allocation Mode: Immediate
Resource Class Name: gpu.nvidia.com
Status:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Failed 21s (x14 over 62s) resource driver gpu.resource.nvidia.com allocate: TODO: immediate allocations not yet supported
could you please share what we are missing?
You aren't missing anything:
allocate: TODO: immediate allocations not yet supported
We haven't added support for immediate mode yet
Thanks, Do we know when will immediate mode be supported in Nvidia's DRA driver implementation?
Ping! Can we request a roadmap for features that are planned for Nvidia's DRA implementation, for our use case we see Allocation mode as an important feature.
There is no concrete roadmap at the moment. Rapid development on this driver has been paused due to the issues that have come up with getting DRA promoted to beta
upstream. All efforts have been shifted to ensuring this happens in as timely a manner as possible. We will, of course, continue to develop this driver, but it is more important to ensure that DRA happens at all, than to keep adding features here.
We are trying to use allocation mode
Immediate
but it does not work, we see claims created:but claims are not generated on the node: