GPU operator DKMS build failure on 22.04

Summary

When deploying Microk8s on an 22.04 Ubuntu enabled AWS machine a DKMS compile error is thrown:

/usr/src/nvidia-535.129.03/kernel/nvidia-uvm/uvm_perf_events_test.c: In function 'test_events':
/usr/src/nvidia-535.129.03/kernel/nvidia-uvm/uvm_perf_events_test.c:83:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=]
   83 | }
      | ^
/usr/src/nvidia-535.129.03/kernel/nvidia-uvm/uvm_va_block.c: In function 'uvm_va_block_check_logical_permissions':
/usr/src/nvidia-535.129.03/kernel/nvidia-uvm/uvm_va_block.c:10755:60: warning: implicit conversion from 'uvm_fault_type_t' to 'uvm_fault_access_type_t' [-Wenum-conversion]
10755 |     uvm_prot_t access_prot = uvm_fault_access_type_to_prot(access_type);
      |                                                            ^~~~~~~~~~~
/usr/src/nvidia-535.129.03/kernel/nvidia-uvm/uvm_va_block.c: In function 'block_cpu_fault_locked':
/usr/src/nvidia-535.129.03/kernel/nvidia-uvm/uvm_va_block.c:10890:53: warning: implicit conversion from 'uvm_fault_access_type_t' to 'uvm_fault_type_t' [-Wenum-conversion]
10890 |                                                     fault_access_type,
      |                                                     ^~~~~~~~~~~~~~~~~
make[2]: *** [/usr/src/linux-headers-6.8.0-1015-aws/Makefile:1925: /usr/src/nvidia-535.129.03/kernel] Error 2
make[1]: *** [Makefile:240: __sub-make] Error 2
make: *** [Makefile:82: modules] Error 2
Stopping NVIDIA persistence daemon...
Unloading NVIDIA driver kernel modules...
Unmounting NVIDIA driver rootfs...

This is likely due the older operator deploying some older versions of the driver which are missing the correct signatures for the later kernels. Deploying with the latest operator - it is able to deploy successfully: microk8s enable gpu --version 24.6.2

gpu-operator-resources gpu-operator-node-feature-discovery-worker-pntfz 1/1 Running 0 9m3s gpu-operator-resources gpu-operator-node-feature-discovery-worker-xcgxn 1/1 Running 0 9m3s gpu-operator-resources gpu-operator-node-feature-discovery-worker-xxdlt 1/1 Running 0 9m3s gpu-operator-resources nvidia-container-toolkit-daemonset-hv4hc 1/1 Running 0 8m38s gpu-operator-resources nvidia-cuda-validator-cpkb7 0/1 Completed 0 3m54s gpu-operator-resources nvidia-dcgm-exporter-s762v 1/1 Running 0 8m38s gpu-operator-resources nvidia-device-plugin-daemonset-lh97z 1/1 Running 0 8m38s gpu-operator-resources nvidia-driver-daemonset-t84r4 1/1 Running 0 8m44s gpu-operator-resources nvidia-operator-validator-8cnnk 1/1 Running 0 8m38s ingress nginx-ingress-microk8s-controller-f5v8r 1/1 Running 0 85m

inspection-report-20241004_143256.tar.gz

Reproduction Steps

Deploy a GPU enabled machine juju add-machine --constraints='instance-type=g4dn.xlarge root-disk=100G'
Microk8s enable gpu
The daemonset will crash with a DKMS compile error

Introspection Report

Can you suggest a fix?

Change the default version to 24.6.2

https://github.com/canonical/microk8s-core-addons/blob/main/addons/nvidia/enable#L216

canonical / microk8s-core-addons