CFSworks / nvml_fix

A workaround for an annoying bug in nVidia's NVML library. Allows nvidia-smi to work once more!
98 stars 19 forks source link

Add support for 390.x driver and refactor existing code #12

Closed tofurky closed 6 years ago

tofurky commented 6 years ago

This adds support for 390.x which required significant changes due to using nvmlInitWithFlags(unsigned int flags), whereas the previous nvmlInit() functions did not require passing any parameters. Effort was made to keep backwards compatibility (e.g. with 331 and below) by using a large amount of #if/#elif throughout nvml_fix.c. However, the old versions have not been tested beyond verifying they compile.

Also, the Makefile now checks if a supported TARGET_VERSION was provided before attempting any compilation. Without this change, the #error from nvml_fix.c is lost in a wall of other errors due to the refactoring. The make target is named "check_supported" and is called by default.

For the time being, only the offsets for x86_64 have been added.

Other changes:

Signed-off-by: Matt Merhar mattmerhar@protonmail.com

CFSworks commented 6 years ago

Thanks!!

Thaodan commented 6 years ago

Has Nvidia a pkgconfig file? This would avoid setting libdir static. Currently libdir is very Debian centric.

tofurky commented 6 years ago

no, there's no pkgconfig for nvidia that i found. i did try several things to autodetect the libdir (e.g. "find" and some other shell string manipulation), but it was all convoluted, potentially slow, and still wouldn't be 100% effective, so the compromise was to consolidate into a single variable and just add a note describing what to change it to. i doubt many people are building like 'make DESTDIR=/ PREFIX=... libdir=... TARGET_VER=...', and if they are, i'd think they're able to edit the Makefile. if there's a clean/fast/foolproof automated way to detect libdir then i agree it would be nice to have.

tofurky commented 6 years ago

though, come to think of it, README.md should be modified to reflect the changes i.e. libdir.

i'll do another PR to fix that.

tofurky commented 6 years ago

please take a look at the updated README at https://github.com/tofurky/nvml_fix/tree/libdir-doc-fix i'll open another PR, let me know if you think anything else should be changed.

Thaodan commented 6 years ago

Maybe libglvnd could help on this. I mean it should at least use the same libdir.

About the version thing there's a function in nvnml to query the driver version maybe that could help too.

tofurky commented 6 years ago

i think the documentation PR i submitted makes the compilation/install steps a lot more clear, but if we later decided to automate it:

maybe something based on dirname $(ldconfig -NXp|fgrep libnvidia-ml.so.1|awk '{print $4}') would work to find the libdir. and something like ldconfig -NXp|fgrep libnvidia-glcore.so.|awk '{print $1}'|cut -d '.' -f3-4 to find TARGET_VER.

still there are a few things to consider:

logic could be like:

this would complicate the ability to compile the shim without having the driver/library installed, which is mentioned in the current README.md "Note: The nVidia drivers are not a dependency for building the shims."