Closed damonmaria closed 1 year ago
We faced the same issue and we have a confirmation from AWS support that there is a feature of exponential backoff. Having ability to control this behaviour would much needed.
This is constantly catching us out as our servers operate at the edge with intermittent Internet access. Surely the fix cannot be hard.
We need this to be configurable as well. Here are other issues for the same thing:
We've gone to the extent of having a process poll the SSM log, and if it sees the last line has 'Sleeping' then it restarts the ssm service.
We've gone to the extent of having a process poll the SSM log, and if it sees the last line has 'Sleeping' then it restarts the ssm service.
Can you maybe provide the solution you build to circumvent this?
Its really stupid that its setup by default this way lol.
Can you maybe provide the solution you build to circumvent this?
We do this using Puppet so unlikely it will apply to your situation. But here's the check we do to determine if we need to restart the service:
/bin/journalctl -b -u amazon-ssm-agent.service | /bin/tail -n1 | /bin/grep Sleeping
Just received confirmation from the AWS team that they are working on this with a target release of late Q3/early Q4.
We have created a feature request accordingly. Please note that we have a backlog of feature requests. We'll prioritize and work on those requests as they come in.
+1 for this feature.
We lost Internet connection on our edge servers. When Internet was restored the hybrid SSM agent didn't reconnect. Looking through the logs it seems to be using exponential backoff but ends up waiting for ridiculous amounts of time.
Can there be a maximum retry wait of 5 minutes, or something a bit more sane?