aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 320 forks source link

[EKS] [aws-guarduty-agent] [Bottlerocket]: guardduty addon not supported for bottlerocket #1996

Closed singhnix closed 1 year ago

singhnix commented 1 year ago

Community Note

Tell us about your request

As guardduty addon has been released for the EKS clusters, it is noted that it is not working for bottlerocket nodes.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? As guardduty addon has been released for the EKS clusters, it is noted that it is not working for bottlerocket addons.

Are you currently working around this issue? NO workaround

Additional context

Below steps I followed to install addon for EKS cluster

  1. https://docs.aws.amazon.com/guardduty/latest/ug/eks-runtime-monitoring-security-agent-manual.html#eksrunmon-deploy-security-agent
  2. After, this from terminal:

kubectl get pods -n amazon-guardduty
NAME READY STATUS RESTARTS AGE aws-guardduty-agent-rq2fp 0/1 CrashLoopBackOff 7 (49s ago) 11m

  1. kubectl describe pods aws-guardduty-agent-rq2fp -n amazon-guardduty Name: aws-guardduty-agent-rq2fp Namespace: amazon-guardduty Priority: 0 Node: ip-192-168-126-188.ec2.internal/192.168.126.188 Start Time: Wed, 05 Apr 2023 12:11:04 +0530 Labels: app.kubernetes.io/name=aws-guardduty-agent controller-revision-hash=5f98984754 pod-template-generation=1 Annotations: kubernetes.io/psp: eks.privileged Status: Running IP: 192.168.126.188 IPs: IP: 192.168.126.188 Controlled By: DaemonSet/aws-guardduty-agent Containers: aws-guardduty-agent: Container ID: containerd://9a0b0b09d783834b63cd9a2bc10a6230c7c7147e5b1e7b8a7088d3f69531d619 Image: 031903291036.dkr.ecr.us-east-1.amazonaws.com/aws-guardduty-agent:v1.0.0 Image ID: 031903291036.dkr.ecr.us-east-1.amazonaws.com/aws-guardduty-agent@sha256:e38bdd2b1323e89113f1a31bd4bc8e5a8098525dd98e6981a28b9906b1e4411e Port: Host Port: State: Waiting Reason: RunContainerError Last State: Terminated Reason: StartError Message: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't set process label: open /proc/thread-self/attr/exec: read-only file system: unknown Exit Code: 128 Started: Thu, 01 Jan 1970 05:30:00 +0530 Finished: Wed, 05 Apr 2023 12:14:07 +0530 Ready: False Restart Count: 5 Limits: memory: 1Gi Requests: memory: 256Mi Environment: CLUSTER_NAME: eksdemo1 Mounts: /proc from host-proc (ro) /run/containerd/containerd.sock from containerd-sock (ro) /run/docker.sock from docker-sock (ro) /sys/kernel/debug from host-kernel-debug (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hpthb (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: docker-sock: Type: HostPath (bare host directory volume) Path: /var/run/docker.sock HostPathType:
    containerd-sock: Type: HostPath (bare host directory volume) Path: /var/run/containerd/containerd.sock HostPathType:
    host-proc: Type: HostPath (bare host directory volume) Path: /proc HostPathType:
    host-kernel-debug: Type: HostPath (bare host directory volume) Path: /sys/kernel/debug HostPathType:
    kube-api-access-hpthb: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/network-unavailable:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message


    Normal Scheduled 3m15s default-scheduler Successfully assigned amazon-guardduty/aws-guardduty-agent-sgqhp to ip-192-168-126-188.ec2.internal Normal Pulling 3m14s kubelet Pulling image "031903291036.dkr.ecr.us-east-1.amazonaws.com/aws-guardduty-agent:v1.0.0" Normal Pulled 3m9s kubelet Successfully pulled image "031903291036.dkr.ecr.us-east-1.amazonaws.com/aws-guardduty-agent:v1.0.0" in 4.984017648s Normal Created 93s (x5 over 3m9s) kubelet Created container aws-guardduty-agent Warning Failed 93s (x5 over 3m8s) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't set process label: open /proc/thread-self/attr/exec: read-only file system: unknown Normal Pulled 93s (x4 over 3m7s) kubelet Container image "031903291036.dkr.ecr.us-east-1.amazonaws.com/aws-guardduty-agent:v1.0.0" already present on machine Warning BackOff 92s (x9 over 3m6s) kubelet Back-off restarting failed container

  2. Tried to workaround by making /proc as readOnly: false but still pod failed with crashloopback with below error:

kubectl logs aws-guardduty-agent-xxxxx-n amazon-guardduty
2023-04-05T07:04:12.822273Z INFO amzn_guardduty_agent: GuardDuty agent starting with 8 worker thread(s) and 100 max blocking threads. 2023-04-05T07:04:12.984655Z INFO amzn_guardduty_agent: Agent fingerprint: f3962a7b731cfd20ce9570140f7d481102c18ca98465439ddb299a047f6ef95e 2023-04-05T07:04:12.985831Z ERROR amzn_guardduty_agent: Dependency check failed - Invalid kernel version 5.15.90 Error: Pipeline(DependencyError("Invalid kernel version 5.15.90"))

  1. Now, from https://docs.aws.amazon.com/guardduty/latest/ug/guardduty-eks-runtime-monitoring.html#eksrunmon-verified-platform document, kernel 5.15 is not supported.

  2. Hence, I think feature request should be created to have this addon for botterocket and kernel 5.15 support. Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

mikestef9 commented 1 year ago

https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-guardduty-eks-monitoring-systems-processor/

kmute90 commented 11 months ago

I'm getting this error when starting guard-duty agent (v1.3.1-eksbuild.1) on EKS cluster (1.28), on a node that runs Bottlerocket OS (1.16.1):

libbpf: failed to find valid kernel BTF 
libbpf: Error loading vmlinux BTF: -3 
libbpf: failed to load object 'patrol_bpf' 
libbpf: failed to load BPF skeleton 'patrol_bpf': -3

It seems like the kernel is missing the "patrol_bpf" module

weitsochen commented 11 months ago

@kmute90

You need to have your kernel with CONFIG_DEBUG_INFO_BTF=y compiled. I assume that should just works with any modification, but still want to confirm.

You can check if your kernel has BTF built-int by:

# ls -la /sys/kernel/btf/vmlinux

Or by:

cat /boot/config-$(uname -r) | grep CONFIG_DEBUG_INFO_BTF

You should have the vmlinux file and have CONFIG_DEBUG_INFO_BTF=y.

kmute90 commented 11 months ago

Thank you for your response.

I checked it both in Bottlerocket github, the config should be compiled: https://github.com/bottlerocket-os/bottlerocket/pull/799

I run the commands that you suggested and results were:

root@admin]# ls -la /sys/kernel/btf/vmlinux
-r--r--r--. 1 root root 4601363 Nov 23 09:49 /sys/kernel/btf/vmlinux
bash-5.1# cat /boot/config | grep CONFIG_DEBUG_INFO_BTF
CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
weitsochen commented 11 months ago

@kmute90 We have internally reproduced and fixed the issue, and it will come with the next aws-guardduty-agent release.

kmute90 commented 11 months ago

Nice thank you!! Do you have an ETA for the next release?

joebowbeer commented 11 months ago

@weitsochen wrote:

We have internally reproduced and fixed the issue, and it will come with the next aws-guardduty-agent release.

Can you provide more details about the issue you fixed?

Does it manifest in the official AWS images?

Is it specific to EKS 1.28?