Open maarten-blokker opened 9 months ago
I have similar error with latest version
I0525 16:11:36.754627 1 main.go:305] AMD GPU device plugin for Kubernetes
I0525 16:11:36.754694 1 main.go:305] ./k8s-device-plugin version v1.25.2.7-7-g813f150
I0525 16:11:36.754701 1 main.go:305] hwloc: _VERSION: 2.10.0, _API_VERSION: 0x00020800, _COMPONENT_ABI: 7, Runtime: 0x00020800
I0525 16:11:36.754709 1 manager.go:42] Starting device plugin manager
I0525 16:11:36.754719 1 manager.go:46] Registering for system signal notifications
I0525 16:11:36.754985 1 manager.go:52] Registering for notifications of filesystem changes in device plugin directory
panic: runtime error: invalid memory address or nil pointer dereference
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x53bbf3]
goroutine 1 [running]:
github.com/fsnotify/fsnotify.(*Watcher).isClosed(...)
/go/src/github.com/ROCm/k8s-device-plugin/vendor/github.com/fsnotify/fsnotify/inotify.go:66
github.com/fsnotify/fsnotify.(*Watcher).Close(0x0)
/go/src/github.com/ROCm/k8s-device-plugin/vendor/github.com/fsnotify/fsnotify/inotify.go:75 +0x13
panic({0x8cf480?, 0xd53320?})
/usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/fsnotify/fsnotify.(*Watcher).isClosed(...)
/go/src/github.com/ROCm/k8s-device-plugin/vendor/github.com/fsnotify/fsnotify/inotify.go:66
github.com/fsnotify/fsnotify.(*Watcher).Add(0x0, {0x977505?, 0xc000204010?})
/go/src/github.com/ROCm/k8s-device-plugin/vendor/github.com/fsnotify/fsnotify/inotify.go:94 +0x57
github.com/kubevirt/device-plugin-manager/pkg/dpm.(*Manager).Run(0xc000033550)
/go/src/github.com/ROCm/k8s-device-plugin/vendor/github.com/kubevirt/device-plugin-manager/pkg/dpm/manager.go:55 +0x21f
main.main()
/go/src/github.com/ROCm/k8s-device-plugin/cmd/k8s-device-plugin/main.go:331 +0x4e9
EDIT: Installing latest ROCm and hard restart fixed the issue. https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_6.1.60101-1_all.deb
After hard restart it happened again :-(
I am experiencing a runtime error while trying to install the AMD GPU Helm Chart (link: AMD GPU Helm Chart) in my Kubernetes cluster. The pod spawned by the daemonset fails to run, and the error log indicates a segmentation violation (SIGSEGV), perhaps related to some permission issues?
Steps to Reproduce:
Expected Behavior: The AMD GPU device plugin should install without errors and run successfully in the Kubernetes cluster.
Actual Behavior: The pod crashes immediately after starting, with the log indicating a segmentation fault.
Extra information