Hung or stuck instances not torn down

 `echo "shutdown -P +1" > $CURRENT_PATH/shutdown_script.sh`,
      "chmod +x $CURRENT_PATH/shutdown_script.sh",
      `echo "./config.sh remove --token ${runnerRegistrationToken.token} || true" > $CURRENT_PATH/shutdown_now_script.sh`,
      `echo "shutdown -h now" > $CURRENT_PATH/shutdown_now_script.sh`,
      "chmod +x $CURRENT_PATH/shutdown_now_script.sh",
      "export ACTIONS_RUNNER_HOOK_JOB_COMPLETED=$CURRENT_PATH/shutdown_script.sh",

The code above creates a shutdown script and then uses ACTIONS_RUNNER_HOOK_JOB_COMPLETED to make sure it is executed once a job finishes.
We also have github_job_start_ttl_seconds which defines how long an instance is allowed to stay idle before a job is executed
Finally we have the instance TTL which would execute if the two options above both fail for any reason.

I just tested with a job which had an error intentionally introduced to make it fail. Exactly 1 minute after failure the instance was terminated.

Do you have an example of a workflow which could trigger a different type of failure ?

NextChapterSoftware / ec2-action-builder

Hung or stuck instances not torn down #38