Open ligeweiwu opened 1 year ago
Hi Ligeweiwu - the memory bandwidth test is unfortunately not yet releasable as open source. To run the test with open source, you can download a released version of DCGM that matches the open source you're building and copy the plugin libraries to your locally built plugins dir.
@dbeer Thanks for your reply. I have another concept want to confirm. In plugin_src, memory. <-> -r3 test : GPU Memory memtest. <-> -r4 test: Memory Stress memory bandwidth <-> no source code, can only use the released version package Is that right?
Thanks
That's correct, although the memory test should run with -r 2 and higher.
@dbeer Hi dbeer Thanks for you reply. I am building DCGM source code based on the version 3.0.4 (commit version: f6fe5654b780873da528b84cb3d7de10d7abe0d1). But I can not find the corresponding download linking for this version. Could you tell me that where can I download the corresponding released package for this version ? Thanks.
Hi I am building the DCGM source code and using nvvs/dcgmi to perform the diagnostic test. I see all plugintest and they are all in the format of .so. .But when I want to perform the "memory bandwidth" diagnostic, they give me an error:
./dcgmi diag -r "memory bandwidth" -g 2 Error: requested test "memory bandwidth" was not found among possible test choices.
In my case, all plugin.so are in the location of /username/DCGM/_out/Linux-amd64-debug/share/nvidia-validation-suite/plugins/cuda11, and there is no name of "memory bandwidth". And I also see the source code, actually i think it doesn't have the option name "memory bandwidth". It only has "memtest".
So please tell me how can I run "memory bandwith" using DCGM source code?
By the way, the memtest is OK ("./dcgmi diag -r memtest -g 2" works fine, and I also see the corresponding libMemtest.so in plugins/cuda11, and the source code has the option "memtest").
Thanks.