I am running some experiments using NVML and CUDA GeMM implementation for power consumption. I measured the following trend of power consumption for multiplication of two 16384 sized square matrices. The horizontal axis is time in seconds while the vertical axis is power in Watts. The measurements are made on a single Tesla V100S.
As can be seen from the plot, consumption in idle state before the CUDA kernel is called is roughly 25W. After the CUDA kernel finishes and memory is deallocated, the power settles at roughly 50W instead of the original 25W idle state value that it should settle to. In the "Finished" state, the GPU is totally free, all variables deallocated. The measurements are made using NVML C++ API.
I am running some experiments using NVML and CUDA GeMM implementation for power consumption. I measured the following trend of power consumption for multiplication of two 16384 sized square matrices. The horizontal axis is time in seconds while the vertical axis is power in Watts. The measurements are made on a single Tesla V100S.
As can be seen from the plot, consumption in idle state before the CUDA kernel is called is roughly 25W. After the CUDA kernel finishes and memory is deallocated, the power settles at roughly 50W instead of the original 25W idle state value that it should settle to. In the "Finished" state, the GPU is totally free, all variables deallocated. The measurements are made using NVML C++ API.
Is there something that I'm missing here?
Thanks