containers / podman-bootc

Apache License 2.0
24 stars 9 forks source link

disk: Pass /dev/null to containers.attach stdin #59

Closed ckyrouac closed 4 months ago

ckyrouac commented 4 months ago

When using os.Stdin, the initial ssh connection pipe is broken.

ckyrouac commented 4 months ago

@cgwalters This seems to behave the same as using os.Stdin on my machine. I don't see any progress bar either way though, so maybe I'm doing something wrong.

When using os.Stdin, the initial ssh connection is busted until sending a newline. This is the error:

ERRO[0110] Failed to write input to service: write unix @->/home/chris/.local/share/containers/podman/machine/qemu/podman.soc
k: write: broken pipe
cgwalters commented 4 months ago

Argh, sorry for the regression here!

I don't see any progress bar either way though, so maybe I'm doing something wrong.

You need bootc built from git main with the changes from https://github.com/containers/bootc/pull/655

This is effectively reverting https://github.com/containers/podman-bootc/pull/55 right? I think nil is equivalent to /dev/null.

Hmmm....this makes me think that what's going wrong here is we're not correctly detaching from the container? There's a whole lot going on in podman attach.go.

cgwalters commented 4 months ago

Tangentially this code is IMO another great example of the superficial simplicity of Go being paid down later...trying to use nil for "unset value" except that comes back to bite with "Note that an interface value that holds a nil concrete value is itself non-nil. ". In Rust it'd just be Option<T>...

germag commented 4 months ago

@cgwalters This seems to behave the same as using os.Stdin on my machine. I don't see any progress bar either way though, so maybe I'm doing something wrong.

When using os.Stdin, the initial ssh connection is busted until sending a newline. This is the error:

ERRO[0110] Failed to write input to service: write unix @->/home/chris/.local/share/containers/podman/machine/qemu/podman.soc
k: write: broken pipe

I didn't see that when I tested it, it's true that the e2e test don't run on my machine, for some reason it got killed by the OOM killer

germag commented 4 months ago

Tangentially this code is IMO another great example of the superficial simplicity of Go being paid down later...trying to use nil for "unset value" except that comes back to bite with "Note that an interface value that holds a nil concrete value is itself non-nil. ". In Rust it'd just be Option<T>...

A famous question asked to the Go team at a conference: "Why did you choose to ignore any research about type systems since the 1970s"

cgwalters commented 4 months ago

I briefly dug at this a bit more, and what I think is going on is again a classic Go problem¹. Basically, the Attach API spawns a bunch of goroutines, such as this one which copies stdin.

In theory, us cancelling the context should terminate all work spawned from the Attach API...but it doesn't, because goroutines are "stackful" and famously not easily cancellable. And so the Attach API just leaks these goroutines.

If one just invokes podman run --rm -ti someimage somecommand (which is what we're replicating in custom code) the basic fact is that when the podman process exits all its goroutines and process state are torn down reliably by the kernel, papering over this leak.

I'm pretty tempted to just switch back to forking off podman run here. It's not clear to me that using the Go API bindings to the HTTP API is buying us a lot of value here.

That said, I think we could fix the podman code here by changing it to force close the socket when the context is cancelled. That's the usual way to cancel reads in Go. It's still fundamentally racy though without going to the extra level of having something like a "cancellation complete" channel in this API that synchronously waits for the worker goroutines to exit though.

¹ Yes, yes it's a poor craftsman who blames their tools and to be honest, asynchronous Rust also has its share of traps - and that trap is very much precisely because of how cancellation of an async task is just a simple "drop"...but, at least it's pretty straightforward to audit Rust codebases for use of select.

cgwalters commented 4 months ago

https://github.com/containers/podman-bootc/pull/61 is an alternative fix

germag commented 4 months ago

I'm closing this since #61 supersedes it