Azure / AKS

Azure Kubernetes Service
1.92k stars 284 forks source link

[BUG] `microsoft-defender-low-level-collector` crashloops sporadically after auto-upgrading cluster to 1.29.2. #4238

Open kristeey opened 3 weeks ago

kristeey commented 3 weeks ago

Describe the bug microsoft-defender-low-level-collector crashloops and restarts sporadically after auto-upgrading cluster to 1.29.2.

Expected behavior no crashloop and unespected restarts due to goroutine panic.

Screenshots Crash Info ● Container microsoft-defender-low-level-collector ● Restarts 380 ● Status WAITING ● Reason CrashLoopBackOff Previous Container ● Status TERMINATED ● Reason Error ● Started at 2024-04-25T11:44:58Z ● Finished at 2024-04-25T11:46:38Z

Logs from microsoft-defender-low-level-collector container

time="2024-04-25T11:44:58Z" level=info msg="Starting collector manager"
time="2024-04-25T11:44:58Z" level=info msg="Running ig trace exec --auto-mount-filesystems --cwd --host -o json=+runtime,+k8s --filter=args:!~^/usr/bin/nice --filter=args:!~^/usr/bin/runc --filter=args:!~^/usr/bin/docker-init --filter=args:!~^/sbin/xtables-multi --filter=args:!~^/usr/sbin/runc --filter=runtime.containerImageName:!~^mcr.microsoft.com/aks --filter=runtime.containerImageName:!~^mcr.microsoft.com/azuremonitor --filter=runtime.containerImageName:!~^mcr.microsoft.com/azuredefender --filter=runtime.containerImageName:!~^mcr.microsoft.com/containernetworking --filter=runtime.containerImageName:!~^mcr.microsoft.com/azure-pipelines --filter=runtime.containerImageName:!~^mcr.microsoft.com/k8s --filter=runtime.containerImageName:!~^mcr.microsoft.com/azure-policy --filter=runtime.containerImageName:!~^mcr.microsoft.com/oss/ --filter=runtime.containerImageName:!~^mcr.microsoft.com/azure-application-gateway --filter=args:!~^.+/var/lib/docker/ --filter=args:!~^.+/var/log/pods --filter=args:!~^.+du\\s-x\\s-s\\s-B\\s1\\s/var --filter=args:!~^.+find\\s+/var/lib/docker/ --filter=args:!~^.+find\\s+/var/log/pods --filter=args:!~^.+find\\s+/var/lib/kubelet --filter=args:!~^.+bin/WALinuxAgent"
time="2024-04-25T11:45:00Z" level=info msg="IG stderr exception: time=\"2024-04-25T11:45:00Z\" level=error msg=\"cgroup enricher: failed to get cgroup paths on container cac7fe52996e8eb08e17d9aa6397271ed77ef4b5d34aece9a7b85a029026f148: cgroup path not found in /proc/PID/cgroup\""
panic: runtime error: slice bounds out of range [1:0]
goroutine 10 [running]:
tivan.ms/collectors/lowlevel/parser.toProcessEvent({0xc00007c000, 0x6d})
    /go/LowLevelCollector/parser/process_parser.go:56 +0x611
tivan.ms/collectors/lowlevel/parser.(*ProcessParser).Parse(0xc0003c6910, {0xc00007c000?, 0xc000047258?})
    /go/LowLevelCollector/parser/process_parser.go:34 +0x27
tivan.ms/collectors/lowlevel/parser.(*EventParserManagerImpl).parse(0xc0003c8570, {0xc00007c000, 0x6d})
    /go/LowLevelCollector/parser/event_parser_manager.go:70 +0x73
tivan.ms/collectors/lowlevel/parser.(*EventParserManagerImpl).startProcessingEvents(0xc0003c8570)
    /go/LowLevelCollector/parser/event_parser_manager.go:57 +0x3e
created by tivan.ms/collectors/lowlevel/parser.(*EventParserManagerImpl).innerStart in goroutine 1
    /go/LowLevelCollector/parser/event_parser_manager.go:50 +0x56

Environment (please complete the following information):

sandeep041193 commented 1 week ago

Hey,

I'm trying to install the azure defender extension in the azure arc enabled cluster( the cluster is from AWS EKS) , I get the error while installing it:

Error: 60m (x3 over 60m) Warning Failed Pod/microsoft-defender-collectors-sf9bk Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/self/attr/keycreate: invalid argument: unknown

More events from the namespace mdc:

60m                      Normal    Pulled              Pod/microsoft-defender-publisher-gzsbg                                   Container image "mcr.microsoft.com/azuredefender/stable/security-publisher:1.0.102" already present on machine
60m (x3 over 60m)        Normal    Pulled              Pod/microsoft-defender-collectors-sf9bk                                  Container image "mcr.microsoft.com/azuredefender/stable/low-level-collector:2.0.40" already present on machine
60m (x3 over 60m)        Normal    Created             Pod/microsoft-defender-collectors-sf9bk                                  Created container pod-collector
60m (x3 over 60m)        Normal    Started             Pod/microsoft-defender-collectors-sf9bk                                  Started container pod-collector
60m (x3 over 60m)        Warning   Failed              Pod/microsoft-defender-collectors-sf9bk                                  Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/self/attr/keycreate: invalid argument: unknown
60m (x3 over 60m)        Normal    Created             Pod/microsoft-defender-collectors-sf9bk                                  Created container low-level-collector
60m (x3 over 60m)        Normal    Pulled              Pod/microsoft-defender-collectors-s84bg                                  Container image "mcr.microsoft.com/azuredefender/stable/pod-collector:1.0.98" already present on machine
60m (x3 over 60m)        Warning   Failed              Pod/microsoft-defender-collectors-s84bg                                  Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/self/attr/keycreate: invalid argument: unknown
60m (x3 over 60m)        Normal    Created             Pod/microsoft-defender-collectors-s84bg                                  Created container low-level-collector
60m (x3 over 60m)        Normal    Pulled              Pod/microsoft-defender-collectors-s84bg                                  Container image "mcr.microsoft.com/azuredefender/stable/low-level-collector:2.0.40" already present on machine

● Container microsoft-defender-low-level-collector

Kubernetes Version: v1.29.0-eks-a5ec690 Amazon EKS Cluster.