NVIDIA / gpu-operator

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html
Apache License 2.0
1.87k stars 303 forks source link

move permissions for events from Role to ClusterRole #1102

Closed tariq1890 closed 3 weeks ago

tariq1890 commented 3 weeks ago

Events are a namespaced resource created in the default namespace (This is a K8s implementation detail). Since gpu-operator needs to create events in a namespace outside of its own, we grant it Cluster-scoped permissions to manage events

This fixes #1101

tariq1890 commented 3 weeks ago

Verified that this change fixes the following error


E1104 18:00:17.820612       1 event.go:359] "Server rejected event (will not retry!)" err="events is forbidden: User \"system:serviceaccount:nvidia-gpu-operator:gpu-operator\" cannot create resource \"events\" in API group \"\" in the namespace \"default\"" event="&Event{ObjectMeta:{ocp414-chris.1804d5c96788e098  default    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},InvolvedObject:ObjectReference{Kind:Node,Namespace:,Name:ocp414-chris,UID:ee812e4d-cc9d-4c7b-aae9-2470ba2d4c25,APIVersion:v1,ResourceVersion:282257282,FieldPath:,},Reason:GPUDriverUpgrade,Message:Successfully updated node state label to [cordon-required]%!(EXTRA <nil>),Source:EventSource{Component:nvidia-gpu-operator,Host:,},FirstTimestamp:2024-11-04 18:00:17.819279512 +0000 UTC m=+341254.759147241,LastTimestamp:2024-11-04 18:00:17.819279512 +0000 UTC m=+341254.759147241,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:nvidia-gpu-operator,ReportingInstance:,}"