Open iancward opened 5 years ago
Thanks for reaching out to us. We will investigate this.
Was this ever resolved? I'm having a similar problem where I just see a black screen in the console, if I click, I get the cursor that doesn't do anything.
I am also having this issue
Hi, I'm experiencing this issue as well. Are there any updates on whether it is going to be fixed? It would be great to at least get a descriptive error message if the connection to instances with no space is impossible. Right now the AWS Console / CLI just hangs without any visible reason.
+1 Using latest Ubuntu 18 AMI with ssm agent running and installed and necessary SSM/CloudWatch policies attached to role. Weirdest thing, it happens on some instances and not on others. Seems like a bug.
Unfortunately this is still happening. AWS, you should really do something here.
Hello,
We experience the same issue on Red Hat 7.7. We couldn't reach the instance through Session Manager once the partition /var/ was full.
On other side, we observed a different behavior, when partition /var/log was full, the machine was still reachable. Anyway, when we rely on Session Manager to get remote access to server, we would expect to have access to EC2 even if filesystem full.
In all cases where Session Manager has not been able to successfully open a connection due to disk full, I'm able to use SSH to get access. We would like to remove SSH and switch solely to Session Manager, but that doesn't seem possible with longstanding issues like this and we are leaving SSH as a backup, where we can open the ports and distribute the pem key as needed during emergencies.
This is a bit of a circular problem given that the way you're supposed to check the available disk space on a volume is by opening a terminal session in the instance. 🤦♀️ https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-describing-volumes.html
Not sure if it's totally related, I'm running into an issue with the start-session
command hanging, when the target instance is offline. As @iancward mentioned, ctrl+c
, etc does not exit. Closing and re-opening the terminal is necessary.
I haven't had the opportunity to really dig into it, but I took a quick look at the code for the CLI, and found this:
https://github.com/aws/aws-cli/blob/master/awscli/customizations/sessionmanager.py
try:
# ignore_user_entered_signals ignores these signals
# because if signals which kills the process are not
# captured would kill the foreground process but not the
# background one. Capturing these would prevents process
# from getting killed and these signals are input to plugin
# and handling in there
with ignore_user_entered_signals():
# call executable with necessary input
check_call(["session-manager-plugin",
json.dumps(response),
region_name,
"StartSession",
profile_name,
json.dumps(parameters),
endpoint_url])
return 0
Looks like the terminate signals are being swallowed intentionally? I'm not totally sure, but I reckon this ties into things 😄
Any updates on resolving this issue? I've run into the same problem.
Any updates please?
I've heard I should try to increase the volume size, but it's not clear if this will delete all data on the disk.
I've heard I should try to increase the volume size, but it's not clear if this will delete all data on the disk.
Hi @twhetzel, increasing the volume size should not delete the data. However, if you increase the EBS volume size, you will need to access the instance and perform some commands to extend file system to use that extra capacity. See the below links for Linux and Windows instances:
Windows: https://aws.amazon.com/premiumsupport/knowledge-center/expand-ebs-root-volume-windows/ or https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/recognize-expanded-volume-windows.html
Linux: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html
This is a very important issue. I had an EC2 instance that I used SSM to connect to. It had no SSH keys, was located in a private subnet. It ran out of space and it got essentially "bricked" because SSM stopped working. SSM is a critical connectivity tool and an instance becoming unaccessible for no good reason is a huge risk.
It is an important operational requirement to log in to an instance with its root partition full. If SSM session manager cannot handle this, SSH, which has no problem whatsover under these conditions, is still needed as a backup method (with all that it implies).
@nitikagoyal87 Is there at least any type of workaround that we can apply to make SSM session manager work when a root volume is full? Given the time this has been open, is this being worked on?
I'm surprised to see this issue unaddressed. In the EKS best practice docs it is suggested as a best practice to disable SSH and use SSM instead.
Losing all access to a host in a case like this can be extremely painful.
This is an issue that needs to be addressed.
Was pretty stunned to find out about this bug and how long it's been open for. SSM is a great tool and can replace SSH for us almost completely... except for this one critical issue blocking it.
If some pet has failed and run out of space in a weird way the last thing I want to spend time doing is to go and mount the disk on another machine and expand it just to get enough working space I can boot and SSM into the host to figure out what is actually going wrong.
@nitikagoyal87 Was there ever an output of your initial investigation of this?
Thanks
the best solution I found for linux boxes, was to
HTH.
This is an absolutely critical bug. If SSM absolutely must use a disk, we should be able to set up a separate partition to keep it working even if the rest of the system isn't. If SSM can't be relied on as a critical investigation tool, server admins will have to rely on SSH. This increases complexity, security risks and goes against AWS best practices for EKS, to say the least. @VishnuKarthikRavindran as you have been contributing the most recently, is there a chance you could raise this issue with the product team to give it a priority?
As the previous comments already stated, it's critical, that SSM keeps working even if the disk is full. sshd has this capability since ever and especially in those situations, you need to rely on access.
Anything else like SSH keys is nowadays outdated and insecure but seemingly, ssm does not yet have the maturity to replace it properly.
I dunno if this is the appropriate place for this, but when I attempt to start a session (either via the AWS Console or the CLI--via the session-manager-plugin) onto an EC2 will a full root partition, it just hangs. In the console I get a blank screen for a long time and then eventually a blinking cursor that doesn't work.
In the CLI (via session-manager-plugin), I get a message that it's starting a session but then it just hangs. The CLI/plugin doesn't respond to Ctrl+C or Ctrl+D; in fact, I have to start a new terminal on my workstation and kill the CLI command and the plugin command.