Open geekidentity opened 2 years ago
The device plugin doesn’t allocate or release resources itself. It just tells the kubelet what’s available and the kubelet does the allocating/freeing. At present, freeing is done by the kubelet in a garbage collection loop that is triggered the next time a new device is requested by some future pod. It is not in sync with the container lifecycle.
if I continue to use prometheus to count, can I think the GPU is release when record usage is zero. if we count how many GPU is used in a node, do you have any other suggestion help me count right number of use? i don't want count wrong number with prometheus record.
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
1. Issue or feature description
We count GPU usage with prometheus monitor,every k8s node have 2 K40 GPU,but I find some time the
container_accelerator_duty_cycle
has three K40 record, as shown below, the yellow record ends as 12.06, but the blue record starts at 12.02(but now yellow record gpu usage is zero) I think when Pod is terminiting status,the pod will release the GPU? I want kown when k8s-device-plugin release gpu resources.