actions / runner-container-hooks

Runner Container Hooks for GitHub Actions
MIT License
67 stars 43 forks source link

Rejections from `k8s.Exec.exec()` are not handled #107

Closed abonander closed 6 months ago

abonander commented 10 months ago

I've just spent the last couple hours trying to figure out an issue similar to #41 or https://github.com/actions/actions-runner-controller/issues/2805.

The latter does not apply to my situation as I have a self-hosted cluster in microk8s which does not appear to have any sort of API rate limits.

While I have yet to figure out the true error, I'm fairly certain that this unhandled promise rejection is suppressing it, and it appears to be due to a misunderstanding in how a manually constructed Promise works in execPodStep(): https://github.com/actions/runner-container-hooks/blob/b58b13134a59a8704350f858ad802afcf2c57c26/packages/k8s/src/k8s/index.ts#L215-L216

The documentation for the Promise() constructor says this of the closure passed to it, which it calls executor: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/Promise#description

The executor return value is ignored. return statements within the executor merely impact control flow and alter whether a part of the function is executed, but do not have any impact on the promise's fulfillment value.

Since the closure is labeled async, it thus returns a Promise which is being ignored, which means if that exec.exec() call throws an exeception, it will lead to an unhandled promise rejection.

The invocation of exec.exec() should change to a chained style with an explicit .catch() call:

new Promise(function (resolve, reject) {
    exec.exec(/* args */)
        .catch(reject)
})

Or else the await exec.exec() statement should be wrapped in a try/catch.

nielstenboom commented 8 months ago

I can confirm that wrapping this part of the code in a try/catch resolves this issue and allows you to see the actual error, thanks for the hint!

https://github.com/promaton/runner-container-hooks/pull/9

In our case it's a rare failure of a non 2xx response of the Kubernetes API and will probably be resolved when https://github.com/actions/runner-container-hooks/issues/103 is implemented 👍

nikola-jokic commented 8 months ago

Hey, I don't believe that this is an issue. If you look at the code, we pass an arrow function that calls resolve or reject based on the status. Exec resolves to a websocket. The response object is the one that should decide if the promise should resolve or not.

nielstenboom commented 8 months ago

Hey, I don't believe that this is an issue. If you look at the code, we pass an arrow function that calls resolve or reject based on the status. Exec resolves to a websocket. The response object is the one that should decide if the promise should resolve or not.

I'm not a Typescript guy so I cannot really contribute to the things you mention. But adding this try/catch mentioned in this issue really helped us resolve these kind of cryptic error messages occurring in CI:

Post job cleanup.
Run '/runner/k8s/index.js'
node:internal/process/promises:[2](https://github.com/redacted/actions/runs/xxx/job/xx#step:24:2)79
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[UnhandledPromiseRejection: This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). The promise rejected with the reason "#<ErrorEvent>".] {
  code: 'ERR_UNHANDLED_REJECTION'
}
nikola-jokic commented 8 months ago

Right, this is a strong indication that this may also fail. I'll take this one, thank you for explaining the problem further @nielstenboom! And thank you @abonander for submitting the issue!