docker / cli

The Docker CLI
Apache License 2.0
4.88k stars 1.92k forks source link

cmd/docker: split handling exit-code to a separate utility #5229

Closed thaJeztah closed 3 months ago

thaJeztah commented 3 months ago

This allows dockerMain() to return an error "as usual", and puts the responsibility for turning that into an appropriate exit-code in main() (which also sets the exit-code when terminating).

We could consider putting this utility in the cli package and exporting it if would be useful for doing a similar handling in plugins.

- A picture of a cute animal (not mandatory but encouraged)

thaJeztah commented 3 months ago

Working on some other changes related to errors, but this one was pretty isolated, so let me open a small PR for this.

@Benehiko @vvoland ptal

codecov-commenter commented 3 months ago

Codecov Report

Attention: Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Project coverage is 61.49%. Comparing base (cad08ff) to head (eae7509).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #5229 +/- ## ========================================== + Coverage 61.47% 61.49% +0.01% ========================================== Files 298 298 Lines 20815 20811 -4 ========================================== Hits 12797 12797 + Misses 7106 7102 -4 Partials 912 912 ```
thaJeztah commented 3 months ago

AH! I broke something probably; I was indeed thinking if the context.Cancelled() should always be ignored here, or only from within dockerMain() 🤔

=== FAIL: e2e/cli-plugins TestPluginSocketCommunication/detached/the_plugin_does_not_get_signalled (1.00s)
    socket_test.go:194: context cancelled
        Status: , Code: 2

    socket_test.go:199: assertion failed: 
        --- ←
        +++ →
        @@ -1,3 +1,2 @@
         context cancelled
        -Status: , Code: 2

Slightly wondering now if the test should be checking for the context cancelled error to appear in the output, or if the actual error should be the Status: , Code: 2 that should be printed https://github.com/docker/cli/blob/5aae44baaaad82d07cc852293bdb5769d322f59a/e2e/cli-plugins/socket_test.go#L193-L199

thaJeztah commented 3 months ago

Wait; what's this? Do we have 2 separate spellings of cancelled in our codebase? (Notice the cancelled vs canceled (2 -> 1 l))

=== FAIL: e2e/cli-plugins TestPluginSocketCommunication/detached/the_plugin_does_not_get_signalled (1.00s)
    socket_test.go:198: context cancelled

    socket_test.go:207: assertion failed: context cancelled (string) != context canceled (string)

This happened when I changed the test to use context.Canceled.Error();

assert.Check(t, is.Equal(strings.TrimSpace(string(out)), context.Canceled.Error()))
thaJeztah commented 3 months ago

Ha! We do! https://github.com/docker/cli/blob/5aae44baaaad82d07cc852293bdb5769d322f59a/e2e/cli-plugins/plugins/presocket/main.go#L64-L69

thaJeztah commented 3 months ago

And... after my initial confusion around context cancel(l)ed, I think my patch was not "bad", but just now prints the actual (but fugly - to be looked at) cli.StatusError;

=== FAIL: e2e/cli-plugins TestPluginSocketCommunication/detached/the_plugin_does_not_get_signalled (1.00s)
    socket_test.go:208: assertion failed: 
        --- actual
        +++ expected
        @@ -1,2 +1 @@
        -test-socket: exiting after context was done
        -Status: , Code: 2
        +test-socket: exiting after context was done

Before my patch, such errors would be silently discarded; https://github.com/docker/cli/blob/5aae44baaaad82d07cc852293bdb5769d322f59a/cmd/docker/docker.go#L52-L54

It's odd that we're constructing our own output, instead of StatusError.Error() handling proper output; it had ONE job, and it did the absolute worst by printing the exit-code as some ugly string. https://github.com/docker/cli/blob/5aae44baaaad82d07cc852293bdb5769d322f59a/cli/error.go#L31-L33

We can probably borrow similar logic / output as used by Go's exec.ExecError -> ProcessState.String(); https://github.com/golang/go/blob/82c14346d89ec0eeca114f9ca0e88516b2cda454/src/os/exec/exec.go#L872-L874 https://github.com/golang/go/blob/82c14346d89ec0eeca114f9ca0e88516b2cda454/src/os/exec_posix.go#L107-L135

func (p *ProcessState) String() string {
    if p == nil {
        return "<nil>"
    }
    status := p.Sys().(syscall.WaitStatus)
    res := ""
    switch {
    case status.Exited():
        code := status.ExitStatus()
        if runtime.GOOS == "windows" && uint(code) >= 1<<16 { // windows uses large hex numbers
            res = "exit status " + itoa.Uitox(uint(code))
        } else { // unix systems use small decimal integers
            res = "exit status " + itoa.Itoa(code) // unix
        }
    case status.Signaled():
        res = "signal: " + status.Signal().String()
    case status.Stopped():
        res = "stop signal: " + status.StopSignal().String()
        if status.StopSignal() == syscall.SIGTRAP && status.TrapCause() != 0 {
            res += " (trap " + itoa.Itoa(status.TrapCause()) + ")"
        }
    case status.Continued():
        res = "continued"
    }
    if status.CoreDump() {
        res += " (core dumped)"
    }
    return res
}
thaJeztah commented 3 months ago

It still is messy, but "baby steps"; at least https://github.com/docker/cli/pull/5231 helped making these Status: <usualy empty>, Code: <you know you can get the exit code in a shell, right?> errors a bit less fugly, but there's still a lot to be done. I DO think that an error-type that allows setting the exit-code to use is useful (heck, exec.ExitError works fine?), but perhaps it should be a wrapper around the error that's created (which in itself could have information to keep), e.g. thinking along the lines of;

if err := doSomething(); err != nil {
    if errdefs.IsInvalidParameter(err) {
        return cli.StatusErr(err, 123)
    }
    return err
}

But we need to have a good look at that; basically, the status should probably ONLY be set if the error MUST be considered final / terminal; once you set the exit-code, that's it? Every caller handling that error must not ignore the error, and make it trickle up so that the CLI terminates with the given status. We need to look at that consideration if that works (but we could still end up having different classes, basically equivalent to the errdefs types we have, which help putting errors in different "categories").

vvoland commented 2 months ago

Some "regression" after this PR:

[I] pawel ❱ docker run --rm -it alpine
/ # ^C

/ # ^C

/ #
exit status 130
[I] pawel ❱ 

Before:

[I] pawel ~ ❱ docker run --rm -it alpine
/ # ^C

/ # ^C

/ # ^C

/ #

[I] pawel ~ ❱

Notice that after this PR exit status 130 is printed.

cc @thaJeztah

thaJeztah commented 2 months ago

Good one; I think it's worth having a ticket for that @vvoland so that we can look into what the intent was there; do we want it to have a non-zero exit status (if so; why don't we have an error message to go with it?) or did some (context) error trickle through and therefore we set a 130 exit status?

And maybe there's cases we DO want to have just the exit status with no message? 🤔

thaJeztah commented 2 months ago

In the "before" case; was the exit status 0? (echo $?)