NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
15.2k stars 1.28k forks source link

questions about dynamic power state machine #731

Closed nju-zjx closed 2 days ago

nju-zjx commented 5 days ago

Thanks for your contributions to open source. We know that when there are no tasks on the GPU, it goes into an idle state, and during this time, if you check the GPU's bandwidth with lspci, it will show as Gen1. In the file dynamic_power.c, the mechanism of dynamic power states is thoroughly explained. Is this power management related to the idle state mentioned above?

mtijanic commented 5 days ago

Hi there. There are two related but still distinct concepts when it comes to power management. The stuff in dynamic-power.c relates to RTD3/GC6 deep idle states. In these cases, the GPU is effectively powered off (except a few minor power islands). It certainly won't support anything above Gen1 in this state.

However, even if the GPU is not in a deep idle state, the PCIe link speed depends on the clocks/powerdraw of the GPU. For example, here is the difference when running the GPU in different pstates on my system:

$ nvidia-smi --query-gpu="pstate,pcie.link.gen.gpucurrent,pcie.link.gen.gpumax" --format=csv -l 1
pstate, pcie.link.gen.gpucurrent, pcie.link.gen.gpumax
P8, 1, 4
P5, 2, 4
P2, 3, 4
P0, 3, 4

(P0 is highest performance state, P8 is lowest on this GPU)

Hope this helps!

nju-zjx commented 5 days ago

Thank you for the answer. I would like to ask if the dynamic adjustment of pstate is covered in the open-source driver code, such as P8/P0 's switch with the change of clocks/powerdraw. And if so, in which .c file can it be found?

mtijanic commented 5 days ago

The actual pstates are changed by the PMU depending on the GPU load. This algorithm isn't part of these modules (nor the GSP code).

However, applications do have some tools available to instrument and modify these. This API is described here: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080perf.h

Here is a very hacked up minimal example that uses the above API to set the pstate to P0 while the app is running; https://gist.github.com/mtijanic/9c129900bfba774b39914ad11b0041f6

nju-zjx commented 5 days ago

I will try it. Thanks a lot!

nju-zjx commented 5 days ago

https://gist.github.com/mtijanic/9c129900bfba774b39914ad11b0041f6 Sorry, this websits can't open. Has it been transferred to another website

timur-tabi commented 4 days ago

That URL is correct. You might need to log into github.com first.

nju-zjx commented 2 days ago

Indeed, that's the case. I encountered the following issue during my attempt and look forward to an explanation!

The duration at P0 does not depend on NV2080_CTRL_PERF_BOOST_DURATION_MAX, but rather on the time of usleep, which is quite puzzling. Could you please explain the purpose of usleep and why the GPU reverts from P0 to P8 after the function ends?

mtijanic commented 2 days ago

These perfboosts are tied to the 2080/subdevice objects under which they were made. Ignoring all the plumbing, in OOP pseudocode what that thing does is:

{
  client = new Client;
  device = client.CreateChildObject(DEVICE);
  subdevice = device.CreateChildObject(SUBDEVICE);

  subdevice.Control(PERF_BOOST, ...params...);
}

Once the subdevice object gets destroyed, the perfboost request associated with it is cancelled. There are a few ways to destroy the subdevice object:

  1. NvRmFree(hSubdevice) - Explicitly tell the driver to delete this object
  2. NvRmFree(hClient) (or NvRmFree(hDevice)) - Explicitly delete a parent object, which removes everything allocated underneath it
  3. close(nvctl), which then implicitly does an NvRmFree() on all client objects that were allocated using that fd.

What happens in the app is that when usleep() finishes and the program terminates, the kernel will automatically close all open files, which triggers the (3) case above. You can use any other way to keep the program running, such as blocking on user input, polling on an fd, or even breaking into a debugger. So long as it doesn't terminate and the kernel doesn't reap the fd, it will run for the NV2080_CTRL_PERF_BOOST_DURATION_MAX.


By the way, since this is not an actionable issue with the code, I will be converting this into a discussion thread.