Closed pintohutch closed 3 months ago
@pintohutch running the build script will produce the libnvperf_dcgm_host.so
file in the following directory:
_out/Linux-amd64-release/lib/
I just ran it to verify after deleting the _out
directory. It is also found in the tests directory:
_out/Linux-amd64-release/share/dcgm_tests/apps/amd64/
Thanks for the quick response @bmarchant.
I'm on master
commit 18b87c715750d0d44b185dfeb9a7d8e2597443a4.
I'm not seeing it. Maybe I'm doing something wrong? I just removed my previous _out/
and re-ran:
cd dcgmbuild
sudo ./build.sh
cd ..
./build.sh
Checking the output contents:
ls _out/Linux-amd64-release/lib
cmake libdcgmmodulediag.so libdcgmmoduleintrospect.so.3 libdcgmmodulepolicy.so.3.3.5 libdcgm_stub.a
libdcgm_cublas_proxy10.so libdcgmmodulediag.so.3 libdcgmmoduleintrospect.so.3.3.5 libdcgmmodulesysmon.so libnvml_injection.so
libdcgm_cublas_proxy11.so libdcgmmodulediag.so.3.3.5 libdcgmmodulenvswitch.so libdcgmmodulesysmon.so.3 libnvml_injection.so.1
libdcgm_cublas_proxy12.so libdcgmmodulehealth.so libdcgmmodulenvswitch.so.3 libdcgmmodulesysmon.so.3.3.5 libnvml_injection.so.1.0
libdcgmmoduleconfig.so libdcgmmodulehealth.so.3 libdcgmmodulenvswitch.so.3.3.5 libdcgm.so
libdcgmmoduleconfig.so.3 libdcgmmodulehealth.so.3.3.5 libdcgmmodulepolicy.so libdcgm.so.3
libdcgmmoduleconfig.so.3.3.5 libdcgmmoduleintrospect.so libdcgmmodulepolicy.so.3 libdcgm.so.3.3.5
ls _out/Linux-amd64-release/share/dcgm_tests/apps/amd64
configuration_sample field_value_sample libdcgmmoduleconfig.so.3 libdcgmmodulehealth.so.3 libdcgmmodulenvswitch.so.3 libdcgmmodulesysmon.so.3 libnvml_injection.so.1 stub_library_test
dcgmi health_sample libdcgmmoduleconfig.so.3.3.5 libdcgmmodulehealth.so.3.3.5 libdcgmmodulenvswitch.so.3.3.5 libdcgmmodulesysmon.so.3.3.5 libnvml_injection.so.1.0 testdcgmunittests
dcgmproftester10 libdcgm_cublas_proxy10.so libdcgmmodulediag.so libdcgmmoduleintrospect.so libdcgmmodulepolicy.so libdcgm.so modules_sample
dcgmproftester11 libdcgm_cublas_proxy11.so libdcgmmodulediag.so.3 libdcgmmoduleintrospect.so.3 libdcgmmodulepolicy.so.3 libdcgm.so.3 nv-hostengine
dcgmproftester12 libdcgm_cublas_proxy12.so libdcgmmodulediag.so.3.3.5 libdcgmmoduleintrospect.so.3.3.5 libdcgmmodulepolicy.so.3.3.5 libdcgm.so.3.3.5 policy_sample
DcgmProfTesterKernels.ptx libdcgmmoduleconfig.so libdcgmmodulehealth.so libdcgmmodulenvswitch.so libdcgmmodulesysmon.so libnvml_injection.so process_stats_sample
@pintohutch Sorry for the confusion, that library is closed source and allows for "continuous mode profiling" for DC profiling. Apologies for my earlier response, I was looking at the wrong repo.
Ah thanks for confirming @bmarchant.
I suppose the best way to get a compatible version of the library would be to pull it from a Docker image with a matching version of the compiled source?
@pintohutch,
That's right. You can get that library from any official DCGM package (docker/deb/rpm) and place it in the location where nv-hostengine can find it. You will also need to grab the libdcgmmobuleprofiling library.
Thanks @nikkon-dev - I didn't notice that was missing as well.
@nikkon-dev @bmarchant - my follow-up question here is: what field IDs do the libraries sourced in this OSS repo expose? Compared to what's only available through the closed-source libraries (e.g. libdcgmmobuleprofiling
and libnvperf_dcgm_host
)?
Is there any documentation around that?
I can close this issue and open a new one to make the ask clearer if that's better
@pintohutch,
Unfortunately, that is not currently documented. However, I have created a ticket to update the documentation with more accurate details about the modules that provide each Field ID and the differences between OSS and official builds.
All modules not included in OSS can be utilized from official DCGM packages, and DCGM will supply all Field IDs. However, it is difficult to determine which field corresponds to which module.
Ok thanks @nikkon-dev.
Lemme know if there's a place I can track that effort. If it's internally tracked, that's fine too.
Feel free to close this as my original question has been answered - thanks for the prompt responses!
@pintohutch,
That's internally tracked as we have not open-sourced the documentation sources (thus no Github issues).
WBR, Nik
Hey @nikkon-dev or @bmarchant - qq: are there any plans to open-source the profiling modules in the future?
@pintohutch,
I want to clarify that there are currently no plans to use the profiling module for newer architectures. This module was designed for pre-Hopper architectures, and newer architectures utilize GPM functionality via NVML, so it is not needed at all.
It's worth noting that the profiling module relies on undocumented and unofficial APIs that we cannot make open source.
Hey @nikkon-dev - thanks for the response and for clarifying this.
Hello,
I am running the
build.sh
script to build DCGM, however I do not see thelibnvperf_dcgm_host.so
file generated in the build output in_out/Linux-amd64-release/
.Is there a flag I need to pass to the script to generate this? Or is the library not built using the source from this repo?
Thanks