aws / amazon-ssm-agent

An agent to enable remote management of your EC2 instances, on-premises servers, or virtual machines (VMs).
https://aws.amazon.com/systems-manager/
Apache License 2.0
1.06k stars 326 forks source link

Feature request: make `maxBackOffInterval` configurable #491

Open hencrice opened 1 year ago

hencrice commented 1 year ago

In some cases, our on-premise hosts boot up without internet connectivity. And once the SSM agent enters hibernate mode, it becomes increasingly harder to make it resume back to active mode even though the connectivity is restored.

Please consider making the maxBackOffInterval below configurable.

https://github.com/aws/amazon-ssm-agent/blob/44665b7ca49ae3d5e302a57d0931edea6d8e4771/agent/hibernation/hibernation.go#L54

Thanks!

timharris777 commented 1 year ago

We are having the same issue. Our boxes can be installed without an internet connection. We just ran into a box that was waiting six hours to connect once the internet connection was live because of the exponential backoff. Being able to set a maxBackOffInterval would be awesome.

sluggard76 commented 1 year ago

hencrice

We have created a feature request. Please note that we have a backlog of feature requests. We'll prioritize and work on those requests as they come in.

timharris777 commented 1 year ago

@sluggard76 , how do we track the progress of the feature request?

strophy commented 5 months ago

@sluggard76 can you let us know that status of this feature request? We have an edge device with occasional network failures, and all connectivity except SSM Agent is restored promptly when network connectivity is restored. The exponential backoff takes too long to retry, we need to be able to limit it somehow.

Why did you close this issue as completed if it isn't actually done, doesn't that defeat the purpose of a public issue tracker?

strophy commented 4 months ago

I purchased a support subscription with AWS and opened a ticket (ID 171897447000512) to try and determine the status of this feature request, and additionally asked the support agent to ask the development team to stop closing issues as completed when they aren't completed.

@sluggard76 do you still work at AWS? Can you please respond?

Related prematurely closed issues: https://github.com/aws/amazon-ssm-agent/issues/468 https://github.com/aws/amazon-ssm-agent/issues/479

strophy commented 3 months ago

ssm-agent 3.3.808.0 was released today and includes a fix for Make long sleep for onprem same as long sleep for EC2, and cap sleep time at 30 minutes for OnPrem instances after successfully requesting the fix via AWS support.

I don't have time to test it now but it looks like this is what we have been asking for, can anyone verify?

mochaslave commented 3 months ago

ssm-agent 3.3.808.0 was released today and includes a fix for Make long sleep for onprem same as long sleep for EC2, and cap sleep time at 30 minutes for OnPrem instances after successfully requesting the fix via AWS support.

I don't have time to test it now but it looks like this is what we have been asking for, can anyone verify?

I can't find any related parameter in the amazon-ssm-agent.json.template file. Does that mean the default cap time has been changed from 24 hours to 30 minutes, but it is still unconfigable?

strophy commented 3 months ago

It looks like it was done in this commit: https://github.com/aws/amazon-ssm-agent/commit/d76f19c96be9d9c88baa14238b5ae467690ecc75

Seems to be mostly changes related to maximum durations and how the backoff is calculated, no variable available.

mochaslave commented 3 months ago

It looks like it was done in this commit: d76f19c

Seems to be mostly changes related to maximum durations and how the backoff is calculated, no variable available.

Then it's not so helpful. :(

Aperocky commented 1 month ago

See that this is not implemented and valid feature request, reopening.

We will look into implementing this feature and potentially incorporating it into our roadmap.