Open pharindoko opened 2 months ago
I'm not PowerShell expert, but I do believe we are already doing that. Are you sure these are the logs of the right instance? It seems like a log of a runner that the idle reaper terminated. In that case, the step function execution should have also been aborted.
Yes I'm very sure that it's the right instance. We were able to replicate the issue running the same job again. It's clear that we should fix this job in anyway - but it still would be nice to see that the machine is stopped ecen when an error appears in the action function.
Try catch in powershell: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_try_catch_finally?view=powershell-7.4
Would you be able pull up the user data log from that machine so I can better understand what exactly failed there? It should be in C:\ProgramData\Amazon\EC2-Windows\Launch\Log\UserdataExecution.log
. As far as I understand PowerShell, executing a script (like run.cmd
executed by action()
) doesn't raise exceptions. Either way I'd like to both fix the error and possibly add try/catch.
Would you be able pull up the user data log from that machine so I can better understand what exactly failed there? It should be in
C:\ProgramData\Amazon\EC2-Windows\Launch\Log\UserdataExecution.log
. As far as I understand PowerShell, executing a script (likerun.cmd
executed byaction()
) doesn't raise exceptions. Either way I'd like to both fix the error and possibly add try/catch.
couldn`t find the UserdataExecution.log ...
aws mentions it here ....
You can't find the user data logs
The log files for EC2Launch, EC2Launch v2, and EC2Config contain the output from the standard output and standard error streams. You can access the log files at the following locations:
EC2Launch v2: C:\ProgramData\Amazon\EC2Launch\log\agent.log
EC2Launch: C:\ProgramData\Amazon\EC2-Windows\Launch\Log\UserdataExecution.log
EC2Config: C:\Program Files\Amazon\Ec2ConfigService\Logs\Ec2ConfigLog.txt
guess we use ec2launch v2
and I found the agent.log
.
will provide it to you...
Hey @kichik,
I had one special use case which I can replicate. While the job has been successfully completed in github, the ec2 instance and the step function job execution are still running.
runner.log
What`s the problem:
The machine is still running and we waste money until we recognize it. (yes additional alerting in this case would make sense too but I haven`t yet in place.)
Proposal:
It would be great to have a try catch block around the action statement in powershell https://github.com/CloudSnorkel/cdk-github-runners/blob/f08da20f3fe70ae8fc86f85db304b15e191601f3/src/providers/ec2.ts#L165
to ensure the machine get`s terminated https://github.com/CloudSnorkel/cdk-github-runners/blob/f08da20f3fe70ae8fc86f85db304b15e191601f3/src/providers/ec2.ts#L172