Closed lorengordon closed 2 years ago
Well that's interesting. I checked the Windows Event Logs. I see an event where the startup type of the Amazon SSM Agent service is changed from Disabled to Automatic (not sure what is making that change, I guess it's just part of the Amazon Windows AMI?). But the service isn't started. If I start the service, then the SSM logs show up and the instance executes the Run-Command to join the domain.
The agent startup does not depend on cfn-init, so if logs are not appearing it means the service hasn't started. If the service doesn't start then Systems Manager will be unable to deliver documents for execution to the agent and the execution will eventually return a status of "Delivery Timed Out".
I am seeing some weird behavior where I am hoping you can confirm what the SSM Agent is doing. I have a CFN template where I am using an SSM Association to run the document,
AWS-JoinDirectoryServiceDomain
. I am also using cfn-init Metadata to apply the rest of the instance config at launch time. This is a Windows instance, and in order to coordinate between the SSM Association and cfn-init, I just have a simple step in the cfn-init steps that waits for the instance to reboot:The
waitAfterCompletion
value offorever
will exit cfn-init and resume after the SSM Association joins the domain and reboots the computer.This has worked alright, except sometimes the domain-join fails. And that doesn't get communicated back to CloudFormation. Which makes it difficult to coordinate any error handling. So I was testing other values, like
'1200'
, so at least I could get it to fail faster:This is where I am seeing the weird behavior. The domain-join never happens. The Run-Command will fail with the mysterious error, "Delivery Timed Out". If I login to the system while testing this, I can see in the cfn-init log that it is waiting for the reboot for the requested 1200 seconds.
However, there are no logs from the SSM Agent at all. If I set the value back to
forever
, then everything works fine.So my working theory is that the SSM Agent refuses to run when it detects that the cfn-init process is running. Is that true? I can't seem to find any documentation on this behavior, or any interaction with cfn-init.