aws-solutions / instance-scheduler-on-aws

A cross-account and cross-region solution that allows customers to automatically start and stop EC2 and RDS Instances
https://aws.amazon.com/solutions/implementations/instance-scheduler-on-aws/
Apache License 2.0
542 stars 264 forks source link

DynamoDB State DB miss reporting correct state of EC2 Instance #516

Closed neilromatowski0373 closed 6 months ago

neilromatowski0373 commented 6 months ago

Hi,

I have come across an issue where the state table in Dynamo is showing that the instance state for two EC2 instances is in a running state. In reality the EC2 instances are in a stopped state. It seems we are out of sync. I have removed the account from the management account. Would the state database be updated to remove the account reference? I was going to then re-add the account to see if this corrects the state reporting in Dynamo. Is there something else I need to consider, do? Also if this has happened once then could there be other accounts that could be affected. Not sure how we can pick this up?

Many thanks.

Neil.

hearde commented 6 months ago

The state table does not store the actual state of the instance. Unless you are troubleshooting a scheduling problem, I would not worry about a mismatch.

What unexpected scheduling behavior are you observing?

neilromatowski0373 commented 6 months ago

The EC2 instances have the correct tag schedule defined and I can see activity in CloudTrail to StartInstance but the instances remain in a stopped state. I have also checked the logs on the Management Server and all looks to be logging fine. Attached a sample for review.

Many thanks.

Neil Screenshot 2024-02-16 145629

hearde commented 6 months ago

"last desired state was running" indicates that Instance Scheduler's last action on this instance was to start it, likely at the start of the "office-hours-uk" period. Since the actual state is stopped, someone has stopped the instance manually. The default behavior of the solution is to not interfere with instances that are stopped within a running period or started outside of one. If you want different behavior, consider the enforced flag.

neilromatowski0373 commented 6 months ago

Thank you for the clarification. Can I confirm if I merely just need to add a string value to the schedule - called enforced and set the value to true? This will then bring the instances back online on the next begin time cycle.

Many thanks.

hearde commented 6 months ago

I would recommend using automated tooling sourced from version control, like CloudFormation templates, but yes, that sounds right. Please reopen this issue if, after reviewing the implementation guide, the scheduler does not seem to be working like you expect.