aws / amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
https://aws.amazon.com/systems-manager/
Apache License 2.0
1.06k stars 322 forks source link

Generic Linux kernel incompatibile with v2.3.1569.0? #302

Closed nouse closed 4 years ago

nouse commented 4 years ago

We have some ec2 instances running on 4.4 generic kernel and some on running on 4.4 aws verion, system OS are both Ubuntu 16.04.

Last Friday after updating ssm agent to v2.3.1569.0, we found instances running generic kernel stop responding after system reboot, system log obtained from EC2 console only contains information before system reboot, but instances running aws kernel weren't affected. Rebooting instances didn't work, we terminated and recreated instances to solve the problem, and we restrict agent version to stay at 2.3.978.0.

2020/08/14 13:24:21Z: Amazon SSM Agent v2.3.1569.0 is running
2020/08/14 13:24:21Z: OsProductName: Ubuntu
2020/08/14 13:24:21Z: OsVersion: 16.04
         Stopping Authenticate and Authorize Users to Run Privileged Tasks...
         Stopping ACPI event daemon...
[  OK  ] Stopped target Graphical Interface.
[  OK  ] Stopped target Timers.
[  OK  ] Stopped Daily apt upgrade and clean activities.
[  OK  ] Stopped Daily Cleanup of Temporary Directories.
[  OK  ] Stopped Message of the Day.
         Stopping Accounts Service...
         Stopping OpenTelemetry Collector Contrib...
[  OK  ] Stopped target Cloud-init target.
[  OK  ] Stopped Execute cloud user/final scripts.
[  OK  ] Stopped Apply the settings specified in cloud-config.
[  OK  ] Stopped target Cloud-config availability.
[  OK  ] Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.
[  OK  ] Stopped target Multi-User System.
         Stopping Regular background program processing daemon...
         Stopping node_exporter.service...
         Stopping LSB: Start/stop sysstat's sadc...
         Stopping LSB: daemon to balance interrupts for SMP systems...
         Stopping LSB: Set the CPU Frequency Scaling governor to "ondemand"...
         Stopping Unattended Upgrades Shutdown...
         Stopping Service for snap applicati...on-ssm-agent.amazon-ssm-agent...
         Stopping rabbitmq-exporter.service...
         Stopping rn-misc-exporter.service...
         Stopping sensorcore_registration.service...
         Stopping LSB: MD monitoring daemon...
[  OK  ] Stopped Wait until snapd is fully seeded.
         Stopping filebeat.service...
         Stopping D-Bus System Message Bus...
         Stopping ffmpeg_server.service...
         Stopping LSB: Set up cgroupfs mounts....
         Stopping iptables_exporter.service...
         Stopping sensorcore_endpoint.service...
         Stopping LXD - container startup/shutdown...
         Stopping Deferred execution scheduler...
[  OK  ] Stopped ssh-host-keys.service.
         Stopping OpenBSD Secure Shell server...
         Stopping LSB: automatic crash report generation...
         Stopping spooler.service...
[  OK  ] Stopped target Login Prompts.
         Stopping Getty on tty1...
         Stopping rabbitmq.service...
         Stopping journalbeat.service...
         Stopping FUSE filesystem for LXC...
         Stopping Serial Getty on ttyS0...
         Stopping LSB: Record successful boot for GRUB...
         Stopping Snap Daemon...
         Stopping AWS X-Ray Daemon...
[  OK  ] Stopped Daily apt download activities.
[  OK  ] Unmounted /var/lib/lxcfs.
[  OK  ] Stopped ACPI event daemon.
[  OK  ] Stopped Regular background program processing daemon.
[  OK  ] Stopped iptables_exporter.service.
[  OK  ] Stopped Accounts Service.
[  OK  ] Stopped Deferred execution scheduler.
[  OK  ] Stopped AWS X-Ray Daemon.
[  OK  ] Stopped node_exporter.service.
[  OK  ] Stopped OpenBSD Secure Shell server.
[  OK  ] Stopped Unattended Upgrades Shutdown.
[  OK  ] Stopped Authenticate and Authorize Users to Run Privileged Tasks.
[  OK  ] Stopped Getty on tty1.
[  OK  ] Stopped Serial Getty on ttyS0.
[  OK  ] Stopped filebeat.service.
[  OK  ] Stopped journalbeat.service.
[  OK  ] Stopped OpenTelemetry Collector Contrib.
[  OK  ] Stopped spooler.service.
[  OK  ] Stopped sensorcore_endpoint.service.
[  OK  ] Stopped ffmpeg_server.service.
[  OK  ] Stopped sensorcore_registration.service.
[  OK  ] Stopped Snap Daemon.
[  OK  ] Stopped D-Bus System Message Bus.
[  OK  ] Stopped LSB: Set up cgroupfs mounts..
[  OK  ] Stopped LSB: automatic crash report generation.
[  OK  ] Stopped FUSE filesystem for LXC.
[  OK  ] Stopped LSB: Start/stop sysstat's sadc.
[  OK  ] Stopped LSB: daemon to balance interrupts for SMP systems.
[  OK  ] Stopped LSB: Set the CPU Frequency Scaling governor to "ondemand".
[  OK  ] Stopped LSB: Record successful boot for GRUB.
[  OK  ] Stopped LXD - container startup/shutdown.
[  OK  ] Stopped LSB: MD monitoring daemon.
         Unmounting Mount unit for amazon-ssm-agent, revision 2648...
         Unmounting Mount unit for core, revision 9804...
         Unmounting Mount unit for core18, revision 1880...
         Unmounting Mount unit for core, revision 9665...
         Unmounting Mount unit for core18, revision 1885...
[  OK  ] Removed slice system-serial\x2dgetty.slice.
[  OK  ] Removed slice system-getty.slice.
[  OK  ] Stopped /etc/rc.local Compatibility.
         Stopping Permit User Sessions...
         Stopping Login Service...
[  OK  ] Stopped rn-misc-exporter.service.
[  OK  ] Stopped Permit User Sessions.
[  OK  ] Unmounted Mount unit for core18, revision 1880.
[  OK  ] Unmounted /var/lib/docker/container...fb1e5683dcea1385d309/mounts/shm.
[  OK  ] Unmounted Mount unit for amazon-ssm-agent, revision 2648.
[  OK  ] Unmounted /var/lib/docker/overlay2/...1b30f0a915cec76c43f2aa0f/merged.
[  OK  ] Unmounted Mount unit for core18, revision 1885.
[  OK  ] Unmounted Mount unit for core, revision 9804.
[  OK  ] Stopped Login Service.
[  OK  ] Unmounted Mount unit for core, revision 9665.
[  OK  ] Stopped target User and Group Name Lookups.
[  OK  ] Stopped rabbitmq-exporter.service.
[  OK  ] Unmounted /var/lib/docker/container...09464762e6c42294844a/mounts/shm.
[  OK  ] Unmounted /var/lib/docker/overlay2/...05e4a02a3acb174eec87c875/merged.
[  OK  ] Stopped rabbitmq.service.
         Stopping Docker Application Container Engine...
[  OK  ] Stopped Docker Application Container Engine.
[  OK  ] Stopped chrony-wait.service.
         Stopping LSB: Controls chronyd NTP time daemon...
[  OK  ] Stopped LSB: Controls chronyd NTP time daemon.
[  OK  ] Stopped target System Time Synchronized.
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped target Remote File Systems (Pre).
         Stopping Login to default iSCSI targets...
[  OK  ] Stopped Login to default iSCSI targets.
         Stopping iSCSI initiator daemon (iscsid)...
[  OK  ] Stopped iSCSI initiator daemon (iscsid).
[  OK  ] Stopped target Network is Online.
[  OK  ] Stopped Service for snap applicatio...azon-ssm-agent.amazon-ssm-agent.
[  OK  ] Stopped target Network.
         Unmounting Mount unit for amazon-ssm-agent, revision 2758...
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped target Slices.
[  OK  ] Removed slice User and Session Slice.
[  OK  ] Stopped target Paths.
[  OK  ] Stopped Forward Password Requests to Wall Directory Watch.
[  OK  ] Stopped Dispatch Password Requests to Console Directory Watch.
[  OK  ] Stopped ACPI Events Check.
[  OK  ] Stopped target Sockets.
[  OK  ] Closed Docker Socket for the API.
[  OK  ] Closed UUID daemon activation socket.
[  OK  ] Closed ACPID Listen Socket.
[  OK  ] Closed Socket activation for snappy daemon.
[  OK  ] Closed D-Bus System Message Bus Socket.
[  OK  ] Closed LXD - unix socket.
[  OK  ] Stopped target System Initialization.
         Stopping Update UTMP about System Boot/Shutdown...
[  OK  ] Stopped target Encrypted Volumes.
         Stopping Load/Save Random Seed...
[  OK  ] Stopped target Swap.
[  OK  ] Stopped Initial cloud-init job (metadata service crawler).
         Stopping Raise network interfaces...
[  OK  ] Stopped Load/Save Random Seed.
[  OK  ] Stopped Update UTMP about System Boot/Shutdown.
[  OK  ] Stopped Create Volatile Files and Directories.
[  OK  ] Unmounted Mount unit for amazon-ssm-agent, revision 2758.
[  OK  ] Stopped Raise network interfaces.
[  OK  ] Stopped target Network (Pre).
[  OK  ] Stopped Initial cloud-init job (pre-networking).
[  OK  ] Stopped Apply Kernel Variables.
[  OK  ] Stopped target Local File Systems.
         Unmounting /run/docker/netns/default...
[  OK  ] Stopped Load Kernel Modules.
[  OK  ] Unmounted /run/docker/netns/default.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Stopped target Local File Systems (Pre).
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Reached target Shutdown.
[2106492.792350] reboot: Restarting system
danr-amz commented 4 years ago

We tried to reproduce this issue on Ubuntu Server 16.04, with kernel 4.4.233-0404233-generic obtained from here, and SSM Agent version 2.3.1569.0. However, the instance boots back up as expected after rebooting from either the local shell or from a RunCommand document.

Can you please provide us with the following information to help us reproduce the issue?

Thanks.

nouse commented 4 years ago

The kernel version is linux-image-4.4.0-187-generic 4.4.0-187.217, I will close this issue first and try to reproduce it.