NextChapterSoftware / ec2-action-builder

This is a custom GitHub action to provision and manage self-hosted runners using AWS EC2 On-Demand and/or Spot instances.
Apache License 2.0
11 stars 7 forks source link

Hung or stuck instances not torn down #38

Open MaxDiOrio opened 1 month ago

MaxDiOrio commented 1 month ago

When a build fails and the EC2 instance doesn't run the shutdown script it seems that the EC2 instance is never cleaned up. The one below was a timeout waiting for the self-hosted runner to register.

Ec2 spot instance strategy is set to none Starting instance with none strategy AWS EC2 instance i-01967a62320981c42 is up and running Waiting 30s before polling for runner Polling for runner every 10s Waiting... Waiting... ... Waiting... Error: The operation was canceled.

And the instance remained up.

mahdi-torabi commented 1 month ago
 `echo "shutdown -P +1" > $CURRENT_PATH/shutdown_script.sh`,
      "chmod +x $CURRENT_PATH/shutdown_script.sh",
      `echo "./config.sh remove --token ${runnerRegistrationToken.token} || true" > $CURRENT_PATH/shutdown_now_script.sh`,
      `echo "shutdown -h now" > $CURRENT_PATH/shutdown_now_script.sh`,
      "chmod +x $CURRENT_PATH/shutdown_now_script.sh",
      "export ACTIONS_RUNNER_HOOK_JOB_COMPLETED=$CURRENT_PATH/shutdown_script.sh",

I just tested with a job which had an error intentionally introduced to make it fail. Exactly 1 minute after failure the instance was terminated.

Do you have an example of a workflow which could trigger a different type of failure ?