Open ligeweiwu opened 1 year ago
@ligeweiwu,
The OSS DCGM version does not have the profiling module required for DCP fields (>1000) on GPUs before Hopper.
You can still use them with OSS if you copy libdcgmmoduleprofiling.so
from an official DCGM package.
WBR, Nik
@nikkon-dev
Hi nik, thanks for your reply.
I have another question.
Now I want to test the api dcgmProfGetSupportedMetricGroups by means of the open source code (3.0.4). Thus, I use the command line
./dcgmi profile -l
And it give me the feedback:
Error: Unable to Get supported metric groups: This request is serviced by a module of DCGM that is not currently loaded.
So my question is, When i want to use dcgmProfGetSupportedMetricGroups, should I also copy libdcgmmoduleprofiling.so? Does the utilization of dcgmProfGetSupportedMetricGroups depend on libdcgmmoduleprofiling.so?
Thanks
Hi I have a question for the monitor command of dcgmi dmon -e 1009 (or any number greater than 1000). My working env is +-------------------------------+----------------------+----------------------+ | 2 NVIDIA A100-PCI... Off | 00000000:B1:00.0 Off | 0 | | N/A 27C P0 34W / 250W | 475MiB / 40960MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
When I install the release deb package and input "dcgmi dmon -e 1010", it gives me the expected result. But when I use the open-source code and build it by myself. When i input "dcgmi dmon -e 1010", it gives me an error "This request is serviced by a module of DCGM that is not currently loaded". Is there any difference between release version and open-source for dmon command (>1000).
DCGM version: 3.0.4
Thanks