canonical / pebble

Take control of your internal daemons!
https://canonical-pebble.readthedocs-hosted.com/
GNU General Public License v3.0
143 stars 54 forks source link

Pebble exec, when in enforced interactive mode, is prone to losing some of the command output #306

Closed dmitry-lyfar closed 1 month ago

dmitry-lyfar commented 11 months ago

If one enforces the pebble exec interactive mode when executing a command and stdin returns EOF sooner than reading from the commands stdout ends, it may result in some output lost. Consider:

> pebble exec -i -t -- tee < sometextfile
> pebble exec -i -t -- ls </dev/null

In both cases, the command's output will either be nonexistent or partial. This is likely due to the client closing the websocket connection with the server (/v1/tasks/<task-id>/websocket/stdio) before the command's stdout is fully read. In the interactive mode, stdio websocket is used to redirect both stdin and stdout, hence, if the client's stdin go routine sends the end message to the websocket server it affects the routine reading stdout (see the line 68): https://github.com/canonical/pebble/blob/613429877dfafaa891f8d57fd4b7aee6e5a3d494/internals/wsutil/wsutil.go#L54-L70

If sending the end message is delayed until the stdout routine reads all the produced output, it resolves the issue.

benhoyt commented 11 months ago

Thanks for the report, @dmitry-lyfar. Just for the record, the three different cases I've seen are as follows (with the first being the most common):

$ PEBBLE=~/pebble ./pebble exec -i -t -- tee <go.mod
[exit code 129]

$ PEBBLE=~/pebble ./pebble exec -i -t -- tee <go.mod
error: cannot perform the following tasks:
- exec command "tee" (fork/exec /usr/bin/tee: input/output error)
[exit code 1]

$ PEBBLE=~/pebble ./pebble exec -i -t -- tee <go.mod
module github.com/canonical/pebble

go 1.20

require (
    github.com/canonical/go-flags v0.0.0-20230403090104-105d09a091b8
    github.com/canonical/x-go v0.0.0-20
[exit code 129]
dmitry-lyfar commented 3 months ago

@benhoyt This is the relevant issue from LXD that I mentioned to you before: https://github.com/canonical/lxd/issues/12285. Also, please see this comment on the GetPollRevents: https://github.com/canonical/lxd/issues/12210#issuecomment-1761519380.

dmitry-lyfar commented 1 month ago

Extending the list of the known exit codes (no output):

go run ./cmd/pebble exec -i -t -- tee < HACKING.md
exit status 137
dmitry-lyfar commented 1 month ago

@benhoyt @IronCore864 I think I narrowed the issue to a POC that potentially solves it. The root problem is that by using -i we enforce the server to use a single master pty descriptor to read and write data from/to the client. If the descriptor is closed at the time Pebble finishes reading stdin (but not writing to stdout), we get the issues above.

Pebble starts two go routines that mirror stdin/stdout from the client.

  1. Read and mirror stdout from master to the websocket (over to the client).
  2. Read stdin from the websocket and send it to master.

So, on:

go run ./cmd/pebble exec -i -- tee < ./HACKING.md

the go routine from p. 2 reads the input entirely, writes it to the master, sees that there is an EOF (actually, "end" from the websocket) and closes the master descriptor. The go routine from p.1 is still working and reading the output from the master descriptor to mirror it to the websocket. Given that it has just been closed, it detects POLINVAL and exists even if there is data to read.

This is not a problem when the interactive mode is set automatically due to the command being run in the terminal as the reading go routine (p.2) would only return when a user terminates the process somehow.

The suggested solution is simple -- write EOF to the pty to indicate the end of stdin to the process. This way the examples using tee would terminate correctly upon writing the remaining output and exiting due to EOF. I see that there are multiple places where the master descriptor can be closed (i.e. in MirrorToWebsocket and in afterClosers). I could not identify any scenarios where this solution would leak a file descriptor.

See the POC https://github.com/canonical/pebble/pull/466.

benhoyt commented 1 month ago

This is a nice find and fix, thanks a lot @dmitry-lyfar! I'll look at the change a bit more closely in context in the next few days. It certainly seems to work nicely at first blush.

IronCore864 commented 1 month ago

@benhoyt @IronCore864 I think I narrowed the issue to a POC that potentially solves it. The root problem is that by using -i we enforce the server to use a single master pty descriptor to read and write data from/to the client. If the descriptor is closed at the time Pebble finishes reading stdin (but not writing to stdout), we get the issues above.

Pebble starts two go routines that mirror stdin/stdout from the client.

  1. Read and mirror stdout from master to the websocket (over to the client).
  2. Read stdin from the websocket and send it to master.

So, on:

go run ./cmd/pebble exec -i -- tee < ./HACKING.md

the go routine from p. 2 reads the input entirely, writes it to the master, sees that there is an EOF (actually, "end" from the websocket) and closes the master descriptor. The go routine from p.1 is still working and reading the output from the master descriptor to mirror it to the websocket. Given that it has just been closed, it detects POLINVAL and exists even if there is data to read.

This is not a problem when the interactive mode is set automatically due to the command being run in the terminal as the reading go routine (p.2) would only return when a user terminates the process somehow.

The suggested solution is simple -- write EOF to the pty to indicate the end of stdin to the process. This way the examples using tee would terminate correctly upon writing the remaining output and exiting due to EOF. I see that there are multiple places where the master descriptor can be closed (i.e. in MirrorToWebsocket and in afterClosers). I could not identify any scenarios where this solution would leak a file descriptor.

See the POC #466.

Thanks @dmitry-lyfar for the diagnosis and the PoC.

Before this, I had been trying to fix it using various methods, including porting the fix from the LXD PR you mentioned, but without success.

After reading this PoC it came to me that the close of the master descriptor here is premature; it will be closed in the afterClosers.

I tested it with long inputs like the tee command and short ones like a simple ls, and both worked as expected. Let's merge your PR.

benhoyt commented 1 month ago

Once again, much appreciated, @dmitry-lyfar.