Closed pierresteiner closed 3 years ago
For what it is worth, I currently workaround this by having a UserData section in the worker nodegroup creation that looks like this:
UserData:
Fn::Base64:
!Sub |
#!/bin/bash -xe
yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
You can also run the ssm agent as a daemonset as per: https://github.com/mumoshu/kube-ssm-agent But I am not that fond of running containers with hostNetwork: true and privileged: true. Installing ssm with UserData also allows you to debug startup issues in case the node never joins the cluster.
But it would be definitely convenient to have the SSM agent pre-installed in the AMI and have a flag available to start it (or not) via the bootstrap script.
@dlaidlaw thanks, but that won't work for managed node_groups AFAIK: https://github.com/aws/containers-roadmap/issues/596
@davidkarlsen Understood. This is one of the reasons we do not use managed workers. Another important one being the requirement to harden the instance as per CIS Hardening Blueprints. Some people also like to have vulnerability scanning agents and anti-virus software installed.
We recently introduced EKS with Managed Node Groups in our Company but now we stuck without SSM. We dont open SSH in our organization and only way to manage is via SSM with Federated users. Atleast please provide prepackaged image with SSM Agent, if we can not use user data.
@Viren you could work around that by deploying the SSM agent as DaemonSet until managed node groups allow customizations
With the release of the official EKS Best Practices Guide for Security, I hope this issue will get more attention: https://aws.github.io/aws-eks-best-practices/hosts/#minimize-access-to-worker-nodes
As mentioned in all the other issues the same as this one, rather than forcing everyone to run SSM agent by putting it into the AMI, just install it using daemonset because:
https://github.com/mumoshu/kube-ssm-agent
Also, would be cool to create a chart for this tool and put it in eks-charts
As mentioned in all the other issues the same as this one, rather than forcing everyone to run SSM agent by putting it into the AMI, just install it using daemonset because:
1. It's a more "k8s" way to do it 2. CPU/memory resources are accounted for properly in the cluster 3. You can manage updates to SSM agent nicely
https://github.com/mumoshu/kube-ssm-agent
Also, would be cool to create a chart for this tool and put it in eks-charts
I don't think folks are forcing
everyone to run SSM. We're looking for an option to enable
it.
@max-rocket-internet While I like the daemonset idea, it can't offer the full utility of the SSM Agent, specifically the ability to use State Manager to configure aspects of the host. For that to work, the container would need access to the host's root filesystem. Of course, you could create a host mount from /
to /mnt
(or similar) in the container, but State Manager can't currently be configured to chroot
into an an alternate root filesystem.
I don't think folks are forcing everyone to run SSM
It was mentioned putting it in the AMI, that part I'm not keen on 🙂
We're looking for an option to enable it.
Fair enough! Makes sense. I think this is covered in https://github.com/aws/containers-roadmap/issues/596
For that to work, the container would need access to the host's root filesystem
I'm not super familiar with State Manager but many daemonsets mount host directories. It's very common for system management tools. e.g log collectors mount /var/log/pods
, /var/lib/docker/containers
, Sysdig mounts /proc, /dev, /boot, /lib/modules
etc
State Manager can't currently be configured to chroot
I don't think chroot is involved when running a container with host directories mounted. They are just mounted into the container like --volume
in docker. I could be wrong though 🙂
@max-rocket-internet Ordinarily, the SSM Agent expects that it is running on the host, not in a container. And so when synchronizing state according to SSM Documents per State Manager, it expects all paths on which it is operating to be host paths: /usr
is the host's /usr
, /etc
is the host's /etc
, and so on.
As you mention, you can mount host volumes on a case by case basis into a container. But you can't mount the entire host filesystem as is at the root of the container (i.e.,/
on the host is /
inside the container). You could mount the host's root volume into a container as, say, /host
, then chroot /host
and it would look like you were then in the host's root volume - but SSM Agent doesn't support such behavior right now.
Running as a DaemonSet won't help if you're trying to debug an issue of the node not joining the cluster. The node will not have received the DaemonSet spec from the kube-apiserver.
Until the current issue (#593), #596, and #585 have been addressed, managed node groups are not an option for clusters that have both a no-ssh security requirement and a requirement for remote terminal access to the nodes via SSM.
If would be helpful to add a warning about this to the Managed Node Group documentation.
Hey all,
Our aim is to keep the EKS AMI as minimal as possible. Given managed node groups now supports launch templates #585 and custom user data #596, it's straightforward to install the SSM agent at node boot time. In fact, it's the exact example we used in the launch blog.
Is there still an ask for the agent to be baked into the AMI, or is user data support sufficient to meet the feature request as outlined in this issue?
My solution is a daemonset that installs a systemd unit on the host which installs the SSM agent (and a couple other configurations we need):
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: eks-host-config
namespace: kube-system
spec:
selector:
matchLabels:
app: eks-host-config
template:
metadata:
name: eks-host-config
labels:
app: eks-host-config
spec:
initContainers:
- name: ssm-install-unit
image: debian:buster
command:
- sh
- -c
- |
set -x
# Add unit file to install the SSM agent
cat >/etc/systemd/system/install-ssm.service <<EOF
[Unit]
Description=Install the SSM agent
[Service]
Type=oneshot
ExecStart=/bin/sh -c "yum install -y amazon-ssm-agent; systemctl enable amazon-ssm-agent; systemctl start amazon-ssm-agent"
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable install-ssm.service
systemctl start install-ssm.service
# Add unit file to increase inotify watches. Some CI jobs which use inotify fail without this.
cat >/etc/systemd/system/increase-inotify-watches.service <<EOF
[Unit]
Description=Increase inotify watches
[Service]
Type=oneshot
ExecStart=/bin/sh -c "echo 'fs.inotify.max_user_watches = 524288' >/etc/sysctl.d/50-increase-inotify-watches.conf; sysctl --system"
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable increase-inotify-watches.service
systemctl start increase-inotify-watches.service
# Enable the docker bridge so that docker-in-docker
# (dind) works for CI operations. This is equivalent to
# setting --enable-docker-bridge in the EKS userdata
# script. See https://github.com/awslabs/amazon-eks-ami/commit/0db49b4ed7e1d0198f9c1d9ccaab3ed2ecca8cd0
cat >/etc/systemd/system/enable-docker-bridge.service <<EOF
[Unit]
Description=Enable the docker bridge
[Service]
Type=oneshot
ExecStart=/bin/sh -c "if ! grep docker0 /etc/docker/daemon.json; then cp /etc/docker/daemon.json /tmp/; jq '.bridge=\"docker0\" | .\"live-restore\"=false' </tmp/daemon.json >/etc/docker/daemon.json; fi; if ! ip link show docker0; then systemctl restart docker; fi"
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable enable-docker-bridge.service
systemctl start enable-docker-bridge.service
volumeMounts:
- name: etc-docker
mountPath: /etc/docker
- name: run-systemd
mountPath: /run/systemd
- name: etc-systemd
mountPath: /etc/systemd/system
- name: libgcrypt
mountPath: /usr/lib/x86_64-linux-gnu/libgcrypt.so.11
- name: bin-systemctl
mountPath: /bin/systemctl
resources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 100m
memory: 256Mi
containers:
- name: pause
image: gcr.io/google_containers/pause
volumes:
# docker config for enabling the bridge
- name: etc-docker
hostPath:
path: /etc/docker
type: Directory
# systemd's runtime socket
- name: run-systemd
hostPath:
path: /run/systemd
type: Directory
# location for custom unit files
- name: etc-systemd
hostPath:
path: /etc/systemd/system
type: Directory
# systemctl command + required lib
- name: libgcrypt
hostPath:
path: /lib64/libgcrypt.so.11
type: File
- name: bin-systemctl
hostPath:
path: /bin/systemctl
type: File
It's not super elegant but it seems to work so far and it keeps EKS satisfied that my node groups are still upgradeable.
@mikestef9 As soon as we create a new version of the launch template with customized userdata, the EKS console denies us the AMI upgrades since our configuration has diverged.
From our docs, "Existing node groups that do not use launch templates cannot be updated directly. Rather, you must create a new node group with a launch template to do so."
@mikestef9 as I mentioned on another issue about SSM and EKS AMIs, not having the SSM agent is inconsistent both with the default AL2 AMI and with the stated intention to "keep the EKS AMI as minimal as possible".
EKS AMI still includes SSH server. I could buy the "minimal" argument if the idea is also to remove the SSH server and include no remote access tooling at all unless the user installs it.
If it feels wrong to remove SSH (and my guess is it will) then we have to ask why. Is it just about remote access? then the default option provided should be the most auditable and most secure access route which seems at this point to be SSM. unless it's not?
@mikestef9 great news, I've been waiting for a while for launch configurations on managed node groups, will give it a try with the ssm agent.
I want to add that DaemonSets arent the solution.
We want SSM for SSH replacement, inventory, hardening, patch sets
The problem with daemonsets is that it runs a container. Hardening? Hardens the container. Inventory? Inventory of the container.
Using an alternative daemonset to run host level stuff requires permissions that are not feasible and is subject to race conditions
The problem with daemonsets is that it runs a container. Hardening? Hardens the container. Inventory? Inventory of the container.
This is not necessarily true. Privileged containers can escape to the host by calling nsenter -t 1
. And, in fact, this is exactly how tools like https://github.com/kvaps/kubectl-node-shell work.
As long as the entrypoint for the agent does the right thing, it should work just fine.
None of this is a good replacement for simply adding SSM to the AMI considering the base, ecs, beanstalk and others all have it ready to go.
The daemonset aside from that is hacky anyway by mounting crontab and injecting a minute cron to install the RPM meaning that its constantly attempting to even when it doesnt need to:
command: ["/bin/bash","-c","echo '* * * * * root rpm -e amazon-ssm-agent-2.3.1550.0-1.x86_64 && yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm && systemctl restart amazon-ssm-agent && rm -rf /etc/cron.d/ssmstart' > /etc/cron.d/ssmstart && echo 'Successfully installed SSM agent'"]
from here
We will be adding the SSM agent to EKS AL2 AMI in a future release, moved to "We're working on it"
I've run into a workload on our cluster that messes with the node enough that kubelet dies and the node enters NodeNotReady state. Not sure why yet, but it's not super relevant for this comment. I'm noting it here because when using a managed node group with no ssh key there's literally no way to access the node that I'm aware of to debug the issues and having SSM installed as an escape hatch would be great. That's just to say that I'm looking forward to the agent being installed by default!
Hey all,
The SSM agent is now installed and enabled by default in the latest release of the EKS Optimized Amazon Linux AMI
https://github.com/awslabs/amazon-eks-ami/releases/tag/v20210621
Community Note
The new managed EKS workes (https://aws.amazon.com/fr/about-aws/whats-new/2019/11/amazon-eks-adds-support-for-provisioning-and-managing-kubernetes-worker-nodes/) can only be managed using SSH key.
It would be much more flexible if we could use SSM to connect to those upon need. Until now, we were able to install it using UserData script, but this is not an option anymore for managed worker
Tell us about your request Managed EKS worker through SSM agent
Which service(s) is this request for? EKS (managed workers)
Are you currently working around this issue? No workaround, beside not using the service