containers / youki

A container runtime written in Rust
https://containers.github.io/youki/
Apache License 2.0
6k stars 332 forks source link

Unexpected `chdir` invoked on container `init` and `start` #2772

Closed Mossaka closed 1 month ago

Mossaka commented 2 months ago

While investigating a performance issue, I observed that the working directory /run/containerd/io.containerd.runtime.v2.task/<namespace>/<containerid>/ becomes inaccessible or gets deleted after executing the shim::wait() call in the runwasi shim process. This deletion prevents the shim process from reading the address file to delete the shim socket. (e.g. ref code)

Logs

I ran bpftrace on unlink and unlinkat syscalls on that paths and found that youki inner process unlinks the bundle path before containerd calls delete-shim (before process 2611761 gets started).

Process started: /usr/local/bin/containerd-shim-wasmtime-v1 PID: 2611672
Process started: /usr/local/bin/containerd-shim-wasmtime-v1 PID: 2611681
PID 2611707 (youki:[2:INIT]): File unlink in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 2611681 (client_handler): File unlinkat in target directory: /run/containerd/wasmtime/default/testwasm
Process started: /usr/local/bin/containerd-shim-wasmtime-v1 PID: 2611761
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/.testwasm..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/.testwasm..
PID 569984 (containerd): File unlinkat in target directory: .testwasm
PID 569984 (containerd): File unlinkat in target directory: .testwasm
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 569984 (containerd): File unlinkat in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
PID 2611658 (ctr): File unlinkat in target directory: testwasm-stderr
PID 2611658 (ctr): File unlinkat in target directory: testwasm-stdout
PID 2611658 (ctr): File unlinkat in target directory: testwasm-stdin

Specifically, this caught my attention: PID 2611707 (youki:[2:INIT]): File unlink in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..

Question:

I am raising this issue to try to understand why youki does that. This might be the reason why the shim process is not able to delete the socket address after the ttrpc server shuts down.

FYI: @utam0k @jprendes

utam0k commented 2 months ago

I want to make sure something before the investigation: a. Don't the executor and post hook you passed to libcontainer call unlink? b. Is there the smallest step to reproduce this? c. May I ask you to give us before and after syscalls to help us understand?

Mossaka commented 2 months ago

I will try to reproduce this in youki, getting back to you later.

Mossaka commented 2 months ago

Okay I spent more time tracing where the root cause is, and found that after handling the Create request in runwasi, the shim process's current directory has been set to the container root_directory (e.g. /run/youki/<ns>/<id>) by youki at here. And after the Delete request, youki cleans up the container path, and so the shim process doesn't have a current directory anymore.

Question: why does youki chdir to the container directory at init and container_start?

utam0k commented 2 months ago

@Furisto Hi, Thomas. I'd like to know about your comment https://github.com/containers/youki/pull/143/files#r673503679. Is this assuming that console_socket was a relative path?

utam0k commented 2 months ago

Is there a problem here? https://github.com/containers/youki/blob/377a7ca884c0ad30c8646403fe6d5a941fcc99e1/crates/libcontainer/src/tty.rs#L94-L99

utam0k commented 2 months ago

~~Sorry, but I've created another PR to fix it. https://github.com/containers/youki/pull/2780~~

YJDoc2 commented 1 month ago

Hey @Mossaka , The related PR will release soon, but I had a question with this issue - You mentioned that

... youki at here. And after the Delete request, youki cleans up the container path, and so the shim process doesn't have a current directory anymore. Question: why does youki chdir to the container directory at init and container_start?

I'm not sure why youki setting cwd in the container init process would have any issues with shim? The start, run and delete youki processes (created using youki create , youki start and youki delete resp.) would run independent of each other, so probably the only potential cause of the removed dir would be in the delete process right? What would be the issue with start and run processes (and by extension init )?

Mossaka commented 1 month ago

I'm not sure why youki setting cwd in the container init process would have any issues with shim?

The shim process's working directory has been changed by youki APIs, and that directory was never changed back (and in fact, was deleted in youki delete). The shim is not using youki as a CLI, but as a library. That's why it doesn't matter chdir is invoked in create, start, or delete steps. Once it's invoked, the shim process's global state is changed.

YJDoc2 commented 1 month ago

Hey, thanks for the explanation, you are completely right. I had missed the library invocation aspect completely!