aws / amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
https://aws.amazon.com/systems-manager/
Apache License 2.0
1.05k stars 324 forks source link

Amazon SSM agent is still not cleaning up the "orchestration" folder properly #471

Open hexsel opened 2 years ago

hexsel commented 2 years ago

https://github.com/aws/amazon-ssm-agent/issues/94

I'm still having this issue, we're on SSM agent 3.1.1767.0.

Our /var/lib/amazon/ssm/i-[instance id]/document/orchestration/ folder had almost half a million entries (we use SSM for some health checks).

cjinaws commented 2 years ago

do you mind providing the agent configuration?

hexsel commented 2 years ago

I don't think we modified it: I don't see a folder /etc/systemd/system/amazon-ssm-agent.service.d/, and the /etc/init/amazon-ssm-agent.conf file contains only some defaults (contents below) - any other file I am missing?



# Copyright 2016 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"). You may not
# use this file except in compliance with the License. A copy of the
# License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "license" file accompanying this file. This file is distributed
# on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
# either express or implied. See the License for the specific language governing
# permissions and limitations under the License.

description     "Amazon SSM Agent"
author          "Amazon.com"

start on (runlevel [345] and started network)
stop on (runlevel [!345] or stopping network)

respawn

exec /usr/bin/amazon-ssm-agent
cavricks commented 1 year ago

I can confirm it, it also happens over here...

sam-fakhreddine commented 1 year ago

Can confirm this is happening

RavirajanS commented 1 year ago

Is this issue resolved ? We are using version 2.3.274.0 of SSM agent. Still it does not cleanup the orchestration directory in the server.

mikeradka commented 5 months ago

Has this issue been resolved? We're encountering the same problem - a proliferation of repeated entries, steadily consuming disk space. As a point of reference, the instance's disk space usage increased by approximately 19% over the past week:

# du -cha --max-depth=2 /var/lib/amazon/ssm/<i-nnn>/document/orchestration/

8.0K    /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/installLinuxAgents
520M    /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/downloads
24K /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/awsrunShellScript
16K /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/awsdownloadContent
8.0K    /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/installLinuxAgents
520M    /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/downloads
24K /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/awsrunShellScript
16K /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/awsdownloadContent
8.0K    /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/installLinuxAgents
136M    /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/downloads
24K /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/awsrunShellScript
16K /var/lib/amazon/ssm/<i-nnn>/document/orchestration/<uid>/awsdownloadContent

We operate a small yet essential instance, and this issue is significantly impacting our disk space. Would it be safe to delete older repetitions and retain only the most recent entries? Additionally, does ssm offer an autoclean feature to address this?

joshuarussell76 commented 2 months ago

Can confirm this is still an issue.

Does anyone have a suggestion on cleanup?

lokesh-soni commented 1 month ago

Because of this, we face disk full issues for long-running sessions/processes running on background/shell. Any updates on this? Version: 3.3.131.0

philippeouellette commented 1 month ago

I don't understand why nobody on the project could drop that information but as per /etc/amazon/ssm/README.md:

* OrchestrationDirectoryCleanup (string) - Configure only when it is safe to delete orchestration folder after document execution. This config overrides PluginLocalOutputCleanup when set.    
        * Default: "" - Don't delete orchestration folder after execution
        * OptionalValue: "clean-success" - Deletes the orchestration folder only for successful document executions.
        * OptionalValue: "clean-success-failed" - Deletes the orchestration folder for successful and failed document executions.

So with a config like so

[...]
    "Ssm": {
        "Endpoint": "",
        "HealthFrequencyMinutes": 5,
        "CustomInventoryDefaultLocation" : "",
        "AssociationLogsRetentionDurationHours" : 24,
        "RunCommandLogsRetentionDurationHours" : 336,
        "SessionLogsRetentionDurationHours" : 336,
        "PluginLocalOutputCleanup": "",
        "OrchestrationDirectoryCleanup": "clean-success-failed"
    },
[...]

I am now getting

[root@xxxxxxxxxxxxx ssm]# grep cleanup /var/log/amazon/ssm/amazon-ssm-agent.log
2024-08-13 18:09:53 INFO [ssm-document-worker] [5cf30fad-88ef-4b8d-8097-b103a283c348] [DataBackend] orchestration cleanup started for the command with status Success - deleting orchestration directory: /var/lib/amazon/ssm/i-xxxxxxxxxxxxx/document/orchestration/5cf30fad-88ef-4b8d-8097-b103a283c348
2024-08-13 18:10:53 INFO [ssm-document-worker] [527446b6-33a0-425e-b624-c1f09b5e96b9] [DataBackend] orchestration cleanup started for the command with status Success - deleting orchestration directory: /var/lib/amazon/ssm/i-xxxxxxxxxxxxx/document/orchestration/527446b6-33a0-425e-b624-c1f09b5e96b9
2024-08-13 18:11:53 INFO [ssm-document-worker] [2b954500-decc-4cb8-a011-86f308594ae6] [DataBackend] orchestration cleanup started for the command with status Success - deleting orchestration directory: /var/lib/amazon/ssm/i-xxxxxxxxxxxxx/document/orchestration/2b954500-decc-4cb8-a011-86f308594ae6
2024-08-13 18:12:53 INFO [ssm-document-worker] [7429c5fc-0ef4-491a-9866-407e02ac90ef] [DataBackend] orchestration cleanup started for the command with status Success - deleting orchestration directory: /var/lib/amazon/ssm/i-xxxxxxxxxxxxx/document/orchestration/7429c5fc-0ef4-491a-9866-407e02ac90ef
2024-08-13 18:13:53 INFO [ssm-document-worker] [6fadc1ed-23fd-42ae-b373-3eec458ad3cd] [DataBackend] orchestration cleanup started for the command with status Success - deleting orchestration directory: /var/lib/amazon/ssm/i-xxxxxxxxxxxxx/document/orchestration/6fadc1ed-23fd-42ae-b373-3eec458ad3cd
2024-08-13 18:14:53 INFO [ssm-document-worker] [26816d96-07da-47ef-bc52-6b15fb156f69] [DataBackend] orchestration cleanup started for the command with status Success - deleting orchestration directory: /var/lib/amazon/ssm/i-xxxxxxxxxxxxx/document/orchestration/26816d96-07da-47ef-bc52-6b15fb156f69
l-rossetti commented 1 day ago

any news on that?