Build script exit codes are all changed to 1

misterfifths commented 5 months ago

In our .gitlab-ci.yml, we use the allow_failure/exit_codes feature to indicate certain types of CI failures that are acceptable. That works by inspecting the exit code of the build script and comparing it to the given list.

However, in the case of a failure, the executor only ever exits with code 1 (or BUILD_FAILURE_EXIT_CODE if it is set); it ignores any actual exit code from the script run in the VM. The relevant code is here: https://github.com/cirruslabs/gitlab-tart-executor/blob/1f5b77e214bd74ff8cc59458ddb7a83a4301e8b1/cmd/gitlab-tart-executor/main.go#L27-L39.

It would be great if the exit code of the build script in the VM was propagated to the exit code of the executor itself, so that we could use allow_failure. Naively that might involve inspecting the error in the code above to see if it's an ssh.ExitError, and passing along its exit code if so. That might be a little invasive though, since main.go doesn't import the ssh library.

edigaryev commented 5 months ago

I'm afraid that there's nothing that we can do to make this work on the the GitLab Tart Executor's (but I'd love to be wrong here).

The reason being is that GitLab Runner simply does not allow its Custom executors to return anything other than BUILD_FAILURE_EXIT_CODE and SYSTEM_FAILURE_EXIT_CODE, this logic is hardcoded in executors/custom/command/command.go:

func (c *command) waitForCommand() {
    err := c.cmd.Wait()

    eerr, ok := err.(*exec.ExitError)
    if ok {
        exitCode := getExitCode(eerr)
        switch {
        case exitCode == BuildFailureExitCode:
            err = &common.BuildError{Inner: eerr, ExitCode: exitCode}
        case exitCode != SystemFailureExitCode:
            err = &ErrUnknownFailure{Inner: eerr, ExitCode: exitCode}
        }
    }

    c.waitCh <- err
}

I've tried modifying the GitLab Tart Executor to pass-through the exit code from the SSH:

diff --git a/cmd/gitlab-tart-executor/main.go b/cmd/gitlab-tart-executor/main.go
index 9847e0f..1db1247 100644
--- a/cmd/gitlab-tart-executor/main.go
+++ b/cmd/gitlab-tart-executor/main.go
@@ -2,7 +2,9 @@ package main

 import (
        "context"
+       "errors"
        "github.com/cirruslabs/gitlab-tart-executor/internal/commands"
+       "golang.org/x/crypto/ssh"
        "log"
        "os"
        "os/signal"
@@ -35,6 +37,12 @@ func main() {

        if err := commands.NewRootCmd().ExecuteContext(ctx); err != nil {
                log.Println(err)
+
+               var sshExitError *ssh.ExitError
+               if errors.As(err, &sshExitError) {
+                       os.Exit(sshExitError.ExitStatus())
+               }
+
                os.Exit(failureExitCode)
        }
 }

But this simply results in ERROR: Job failed (system failure): unknown Custom executor executable exit code X; executable execution terminated with: exit status X error and allow_failure:exit_codes is not being evaluated since what is being thrown is not a build error, but rather an unknown error.

misterfifths commented 5 months ago

Oh, I had no idea it was a Gitlab-level issue. Thanks for the thorough investigation!

cirruslabs / gitlab-tart-executor

Build script exit codes are all changed to 1 #77