NVIDIA / k8s-device-plugin

NVIDIA device plugin for Kubernetes
Apache License 2.0
2.72k stars 614 forks source link

when k8s-device-plugin release gpu resources #290

Open geekidentity opened 2 years ago

geekidentity commented 2 years ago

1. Issue or feature description

We count GPU usage with prometheus monitor,every k8s node have 2 K40 GPU,but I find some time the container_accelerator_duty_cycle has three K40 record, as shown below, the yellow record ends as 12.06, but the blue record starts at 12.02(but now yellow record gpu usage is zero) I think when Pod is terminiting status,the pod will release the GPU? I want kown when k8s-device-plugin release gpu resources.

image

klueska commented 2 years ago

The device plugin doesn’t allocate or release resources itself. It just tells the kubelet what’s available and the kubelet does the allocating/freeing. At present, freeing is done by the kubelet in a garbage collection loop that is triggered the next time a new device is requested by some future pod. It is not in sync with the container lifecycle.

geekidentity commented 2 years ago

if I continue to use prometheus to count, can I think the GPU is release when record usage is zero. if we count how many GPU is used in a node, do you have any other suggestion help me count right number of use? i don't want count wrong number with prometheus record.

github-actions[bot] commented 7 months ago

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.