DualCoder / vgpu_unlock

Unlock vGPU functionality for consumer grade GPUs.
MIT License
4.62k stars 430 forks source link

dkms install fails #115

Open remopini opened 1 year ago

remopini commented 1 year ago

I have an issue when trying to install the kernel module:

root@proxmox01:~# dkms install -m nvidia -v 525.85.07 Kernel preparation unnecessary for this kernel. Skipping... Building module: cleaning build area... 'make' -j24 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.15.83-1-pve modules......(bad exit status: 2) Error! Bad return status for module build on kernel: 5.15.83-1-pve (x86_64) Consult /var/lib/dkms/nvidia/525.85.07/build/make.log for more information.

If I look at the log referenced above, I see a lot of errors in the form of:

... /root/vgpu_unlock/vgpu_unlock_hooks.c:589:4: note: (near initialization for ‘vgpu_unlock_vgpu[155].num_blocks’) 589 | { (10 + 2 * strlen(name) + 15) / 16, /* num_blocks */ \ | ^ /root/vgpu_unlock/vgpu_unlock_hooks.c:770:2: note: in expansion of macro ‘VGPU’ 770 | VGPU(0x2230, 0x151a, "NVIDIA RTXA6000-48C"), | ^~~~ /root/vgpu_unlock/vgpu_unlock_hooks.c:590:4: error: initializer element is not constant 590 | strlen(name), /* name1_len */ \ | ^~~~~~ /root/vgpu_unlock/vgpu_unlock_hooks.c:770:2: note: in expansion of macro ‘VGPU’ 770 | VGPU(0x2230, 0x151a, "NVIDIA RTXA6000-48C"), | ^~~~ ...

Is this caused by anything I screwed up?

eebrains commented 1 year ago

The issue is due to the library call of 'strlen' in the helper macro. I was able to fix it locally by adding a 'len' field in the macro for the string length, then replace strlen(name) with 'len' within the helper macro. Then manually editing each entry with the proper length value.

So basically the VGU macro looks like this:

#define VGPU(dev_id, subsys_id, name, len) \
        { (10 + 2 * len + 15) / 16,          /* num_blocks */     \
          len,                               /* name1_len */      \
          len,                               /* name2_len */      \
          (dev_id),                          /* dev_id */         \
          0,                                 /* vend_id */        \
          (subsys_id),                       /* subsys_id */      \
          0,                                 /* subsys_vend_id */ \
          { name name } }                    /* name1_2 */

and the initializer looks like this:

static vgpu_unlock_vgpu_t vgpu_unlock_vgpu[] =
{
        /* Tesla M10 (Maxwell) */
        VGPU(0x13bd, 0x11cc, "GRID M10-0B",11),
        VGPU(0x13bd, 0x11cd, "GRID M10-1B",11),
        VGPU(0x13bd, 0x1339, "GRID M10-1B4",12),
        VGPU(0x13bd, 0x1286, "GRID M10-2B",11),
        VGPU(0x13bd, 0x12ee, "GRID M10-2B4",12),
...

That initializer is pretty long, you have to do every entry. It took a while to edit every entry it in nano... but it worked :)

jforman96 commented 1 year ago

Hello, I tried to make it work according to your instructions, but I encounter another error. Do you know where the problem could be?

In file included from /var/lib/dkms/nvidia/525.105.14/build/nvidia/os-interface.c:25:
/root/vgpu_unlock/vgpu_unlock_hooks.c:790:17: warning: ‘vgpu_unlock_bar3_end’ defined but not used [-Wunused-variable]
  790 | static uint64_t vgpu_unlock_bar3_end;
      |                 ^~~~~~~~~~~~~~~~~~~~
/root/vgpu_unlock/vgpu_unlock_hooks.c:789:17: warning: ‘vgpu_unlock_bar3_beg’ defined but not used [-Wunused-variable]
  789 | static uint64_t vgpu_unlock_bar3_beg;
      |                 ^~~~~~~~~~~~~~~~~~~~
/root/vgpu_unlock/vgpu_unlock_hooks.c:788:13: warning: ‘vgpu_unlock_bar3_mapped’ defined but not used [-Wunused-variable]
  788 | static bool vgpu_unlock_bar3_mapped = FALSE;
      |             ^~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:297: /var/lib/dkms/nvidia/525.105.14/build/nvidia/os-interface.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:1909: /var/lib/dkms/nvidia/525.105.14/build] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.15.107-2-pve'
make: *** [Makefile:82: modules] Error 2

Log file: make.log

ksqeib commented 1 year ago

The issue is due to the library call of 'strlen' in the helper macro. I was able to fix it locally by replacing strlen(name) with sizeof(name) -1 within the helper macro. Then manually editing each entry with the proper length value.

So basically the VGU macro looks like this:

#define VGPU(dev_id, subsys_id, name) \
    { (10 + 2 * (sizeof(name) - 1) + 15) / 16, /* num_blocks */     \
      sizeof(name) - 1,                      /* name1_len */      \
      sizeof(name) - 1,                      /* name2_len */      \
      (dev_id),                          /* dev_id */         \
      0,                                 /* vend_id */        \
      (subsys_id),                       /* subsys_id */      \
      0,                                 /* subsys_vend_id */ \
      { name name } }                    /* name1_2 */

It worked :)

My English is poor. So I copied the answer of @eebrains . Thanks for his answer template.XD