Closed Mossaka closed 1 month ago
I want to make sure something before the investigation:
a. Don't the executor and post hook you passed to libcontainer call unlink
?
b. Is there the smallest step to reproduce this?
c. May I ask you to give us before and after syscalls to help us understand?
I will try to reproduce this in youki, getting back to you later.
Okay I spent more time tracing where the root cause is, and found that after handling the Create
request in runwasi, the shim process's current directory has been set to the container root_directory
(e.g. /run/youki/<ns>/<id>
) by youki at here. And after the Delete
request, youki cleans up the container path, and so the shim process doesn't have a current directory anymore.
Question: why does youki chdir
to the container directory at init and container_start?
@Furisto Hi, Thomas. I'd like to know about your comment https://github.com/containers/youki/pull/143/files#r673503679. Is this assuming that console_socket
was a relative path?
~~Sorry, but I've created another PR to fix it. https://github.com/containers/youki/pull/2780~~
Hey @Mossaka , The related PR will release soon, but I had a question with this issue - You mentioned that
... youki at here. And after the Delete request, youki cleans up the container path, and so the shim process doesn't have a current directory anymore. Question: why does youki chdir to the container directory at init and container_start?
I'm not sure why youki setting cwd in the container init process would have any issues with shim? The start, run and delete youki processes (created using youki create
, youki start
and youki delete
resp.) would run independent of each other, so probably the only potential cause of the removed dir would be in the delete process right? What would be the issue with start and run processes (and by extension init )?
I'm not sure why youki setting cwd in the container init process would have any issues with shim?
The shim process's working directory has been changed by youki APIs, and that directory was never changed back (and in fact, was deleted in youki delete
). The shim is not using youki as a CLI, but as a library. That's why it doesn't matter chdir
is invoked in create, start, or delete steps. Once it's invoked, the shim process's global state is changed.
Hey, thanks for the explanation, you are completely right. I had missed the library invocation aspect completely!
While investigating a performance issue, I observed that the working directory
/run/containerd/io.containerd.runtime.v2.task/<namespace>/<containerid>/
becomes inaccessible or gets deleted after executing theshim::wait()
call in therunwasi
shim process. This deletion prevents the shim process from reading theaddress
file to delete the shim socket. (e.g. ref code)Logs
I ran bpftrace on
unlink
andunlinkat
syscalls on that paths and found thatyouki
inner processunlink
s the bundle path before containerd callsdelete-shim
(before process2611761
gets started).Specifically, this caught my attention:
PID 2611707 (youki:[2:INIT]): File unlink in target directory: /run/containerd/io.containerd.runtime.v2.task/default/testwasm/..
Question:
I am raising this issue to try to understand why youki does that. This might be the reason why the shim process is not able to delete the socket address after the ttrpc server shuts down.
FYI: @utam0k @jprendes