Proposal: build debugging in buildx (interactive sessions)

tonistiigi commented 2 years ago

This is a follow-up to BuildKit debugging issue https://github.com/moby/buildkit/issues/1472 so we can discuss the UX and next steps for features like https://github.com/moby/buildkit/pull/2813 . https://github.com/moby/buildkit/issues/1472 is mostly implemented in BuildKit v0.9 with additional signaling patches in v0.10 so unless we missed something no more (breaking) BuildKit changes should be needed.

All of this work does not need to end up in buildx repository, and some may be moved out later. I don't want opinionated dev features in buildctl that should be vendor-agnostic test tool, and I also don't want to maintain two similar, but different debugging stacks. Aside of that, code reuse is encouraged.

BuildKit issue concentrated on internal building blocks for this feature and you should read it first. In here, I'm proposing steps for incremental PRs to end up with a user-friendly debugging feature.

All flag names up to later discussion. Naming is hard.

PRs may be combined where it makes sense for review, but all the described steps should be quite independent.

PR1

Add possibility to interactively run a process using the NewContainer API after the build has completed.

docker buildx build --invoke bash .
docker buildx build --invoke entrypoint=/usr/bin/cat,args=/etc/passwd,env=FOO=bar . # optional longform (not a requirement)

The build will run with the progressbar until completion. Progressbar will finish and container will be launched. Container redirects all stdio and signals. TTY is enabled if user enabled TTY for main process.

PR2

Add "monitor mode" to the interactive process. When running an interactive process, the user can switch between to process io and monitor process io. Similar to QEMU monitor mode. In monitor mode they can issue additional commands.

In the very first PR only supported command may be "exit".

(buildx) exit

PR3

Add "rollback" command to monitor mode. With this command the user can make modifications in the interactive container and when they issue "rollback" command they are brought back to the initial state.

Add "reload" command to monitor mode. This will run the build again(now with possibly updated source) and invoke the shell again.

PR4

Refactor build command to invoke builds in a background process. This is important as we don't want the lifecycle of "debugging session" to be locked into a single process. A socket should be created under ~/.buildx and even if the current process (unexpectedly) dies its state can be accessed again via the socket.

PR5

Add list and attach commands to the monitor mode. List would show all the current active sessions (via the socket described in the previous section). Attach would make that session active in current process. If the session is already active in another process these processes would detach and go to the monitor mode.

PR6

Add exec command to execute new processes in the same debug container. All processes are attachable as described in the previous section.

PR7

docker buildx build --invoke=debug-shell should go directly to monitor mode where processes can be created with exec. We can also have docker buildx debug-shell to start monitor mode without a specific container context.

PR8

Add docker buildx build --invoke=on-error. In this mode, if build ends with an error debug shell will be opened from the error location. The error returned by buildkit is typed and contains the references to the state of the error and state in the beginning of the failed step. Monitor commands allow to switch in-between of these states. Error also includes source map that can be shown to the user.

There are two ways to approach this. As the builder is lazy we can call Solve that will return a result without actually evaluating it. This result can be converted to DefinitionOp which contains the source locations. Now this DefinitionOp can be mutated to build up the the breakpoint and then evaluated and run interactively. I think this is similar to wrapper in buildg without requiring buildkit update or proxy. The problem with this one is the cases when frontend does multiple Solve calls (maybe in parallel). This would start to conflict with the debugger logic. Therefore I think a better approach could be to define this as frontend capability and send breakpoint info with build opts to the frontend. If frontend allows debugging it would stop on the breakpoint and return the result that the debugger will then show.

Monitor mode

Add more functions to monitor mode. Eg. commands to inspect file layouts, transfer files. Keep shell history between invocations.

Buildx bake

Enable interactive sessions in buildx bake. Eg. docker buildx bake dev could build all the images part of the project, run them, and put the user into a dev container. Inside the dev container they can switch between active processes etc.

@ktock @crazy-max

ktock commented 2 years ago

:+1:

I'm recently working on interactive debugger for Dockerfile. So I'm willing to working on this feature.

"buildg" https://github.com/ktock/buildg

About breakpoints:

Now this DefinitionOp can be mutated to build up the the breakpoint and then evaluated and run interactively.

But doesn't this end up evaluating vertexes from the beggining until a breakpoint repeatedly if user sets multiple breakpoints? Alternative way will be adding support of "wrapping" a worker like done by buildg. This implements worker wrapper that supports breakpoints directly. https://github.com/ktock/buildg/blob/d2d03f80dbcf269626a93f11b4557a42bebcfacf/debug.go#L178-L322

tonistiigi commented 2 years ago

But doesn't this end up evaluating vertexes from the beggining until a breakpoint repeatedly if user sets multiple breakpoints?

Ah, I didn't notice that you are even wrapping the Op interface. In practice sending the previous LLB with extra ops does not affect the performance. Previous LLB has already been solved and is inside the active solve graph so when new LLB with the same digest appears it is just directly matched to the existing nodes. This is how the DefinitionOp and parallel targets in bake work as well for example.

jedevc commented 1 year ago

@ktock a massive thanks for all the hard work you've put in so far - I think we're probably close to being able to close this issue :tada: :tada:

Hopefully we'll get some users to try out and give feedback for this in the upcoming v0.11 release.

All flag names up to later discussion. Naming is hard.

I've been trying to think about how we might work on this, the current naming is a bit tricky I think for new users, and I think we need to guard against adding too many new args to normal commands. To be more specific:

--invoke is a bit of an internal implementation detail. I think we want to demonstrate that this is a "debugging" feature, and a name like --invoke doesn't indicate to a user that that's what it does or how it works. I think using invoke internally makes sense, but it makes less sense to expose it to the end-user.
--root, --detach and --server-config are not options that I expect most users to actually use regularly - so we should work to hide these from the user as much as possible. We could have --detach inside the remote server config itself for example. I'm less sure about --root and --server-config - maybe we could put these at the root of the CLI, or even as environment variables?

My first idea was that we could try and rename everything to be debug themed. So, in my head, that would mean that:

We would rename --invoke to --debug
We would rename debug-shell to just debug

Unfortunately for us... --debug is already claimed as a top-level docker CLI option, and does something entirely different. So we'd be in the weird situation of having an option behave differently depending on where it was in the user command, which isn't great.

We could change debug to be something like dev - I guess it's kind of like a "development mode" for Dockerfiles, so that could work.

@crazy-max suggested to me an idea I like a lot better, just have everything debug-related under a single, top-level debug command:

The details (as I imagine them):

buildx debug-shell would just become buildx debug shell or even just buildx debug.
buildx build would remain unchanged from the current user experience. We wouldn't have any debugging flags here, it's all about build (though we still should use the controller api).
buildx debug build would allow all the same options that are currently in build to be specified after the build component. However, we could add any generic debugging flags before the build component, such as --on, e.g.:
```
$ buildx debug --on=error build . --target <target-in-dockerfile>
```
That way, we split out the debugging flags from the build flags, and also allow integrating that more neatly with bake in the future:
```
$ buildx debug --on=error bake <target>
```
buildx debug would allow "extra" args after a -- component, which would be run directly in the monitor, before stdin is connected. For example, to break on line 10, and drop into a shell:
```
$ buildx debug build . --target <target-in-dockerfile> -- break 10\; exec sh
```

I actually really like the flow of this, it reads really easily, and it's not hard to modify a command that you run to be debugged. Also, all of the debugging is in one place, and doesn't need to be spread across multiple commands (even if in the code, that might not be 100% true).

I'm curious what people think about the above ideas, or if anyone has alternative ideas we should consider - I think we should work out what we want to do before the next buildx release (and hopefully implement it!), so we can start to get some feedback from users.

ktock commented 1 year ago

Posted by @tonistiigi at https://github.com/docker/buildx/pull/2006#pullrequestreview-1672830646

Follow-up. --on=error on a container error gets me in the container but there is no context of what happened, what was the last command etc. I think the help command or monitor messages should give context about what is the current interactive context (build result for specific target, error result from command) so there is context of what gets run on exec/reload/rollback.

Follow-up. We need ls command that would list the files in the current dir. If I get error like runc run failed: unable to start container process: exec: "sh": executable file not found in $PATH then I have no idea what I'm missing. This could also be exec of debug image what has the current mounts mounted somewhere.

Posted by @tonistiigi at https://github.com/docker/buildx/pull/2006#pullrequestreview-1681377351

Let's say I have two different types of errors. One is wrong Dockerfile command and another is process error.

Dockerfile:60
--------------------
  59 |     ARG TARGETPLATFORM
  60 | >>> RUN2 --mount=type=bind,target=. \
  61 | >>>   --mount=type=cache,target=/root/.cache \
  62 | >>>   --mount=type=cache,target=/go/pkg/mod \
  63 | >>>   --mount=type=bind,from=buildx-version,source=/buildx-version,target=/buildx-version <<EOT
  64 |       set -e
--------------------
ERROR: dockerfile parse error on line 60: unknown instruction: RUN2 (did you mean RUN?)
[+] Building 0.0s (0/0)                                                                                                                                                                                                         docker:desktop-linux
Launching interactive container. Press Ctrl-a-c to switch to monitor console
Interactive container was restarted with process "o5e8x1ty9nn2j93m62b8zmhdn". Press Ctrl-a-c to switch to the new container
Switched IO

------
Dockerfile:60
--------------------
  59 |     ARG TARGETPLATFORM
  60 | >>> RUN --mount=type=bind,target=. \
  61 | >>>   --mount=type=cache,target=/root/.cache \
  62 | >>>   --mount=type=cache,target=/go/pkg/mod \
  63 | >>>   --mount=type=bind,from=buildx-version,source=/buildx-version,target=/buildx-version <<EOT
  64 | >>>   set -e
  65 | >>>   xx-go2 --wrap
  66 | >>>   DESTDIR=/usr/bin VERSION=$(cat /buildx-version/version) REVISION=$(cat /buildx-version/revision) GO_EXTRA_LDFLAGS="-s -w" ./hack/build
  67 | >>>   xx-verify --static /usr/bin/docker-buildx
  68 | >>> EOT
  69 |
--------------------
ERROR: process "/bin/sh -c   set -e\n  xx-go2 --wrap\n  DESTDIR=/usr/bin VERSION=$(cat /buildx-version/version) REVISION=$(cat /buildx-version/revision) GO_EXTRA_LDFLAGS=\"-s -w\" ./hack/build\n  xx-verify --static /usr/bin/docker-buildx\n" did not complete successfully: exit code: 127
[+] Building 0.0s (0/0)                                                                                                                                                                                                         docker:desktop-linux
Launching interactive container. Press Ctrl-a-c to switch to monitor console
Interactive container was restarted with process "l3x31fm3i08owokoxwtazeuuw". Press Ctrl-a-c to switch to the new container
/ #

As expected only the second one is debuggable (only second one opens shell as well). But from the output they print same messages about interactive containers and switching IO. It should be more clear that these are different types of errors, why first one does not create execution context and what runs in the shell of second one.

docker / buildx

Proposal: build debugging in buildx (interactive sessions) #1104

PR1

PR2

PR3

PR4

PR5

PR6

PR7

PR8

Next:

Breakpoint debugger

Monitor mode

Buildx bake