containerd / nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...
Apache License 2.0
7.86k stars 585 forks source link

nerdctl sometimes fail to check if a container name already exists and ends-up creating duplicates #2992

Open apostasie opened 3 months ago

apostasie commented 3 months ago

Description

Title says all

Steps to reproduce the issue

Unfortunately, I do not have a reproducer and this seems to happen under certain circumstances only.

This is definitely confirmed though:

apo@oliphant:~ $ sudo ./nerdctl ps
CONTAINER ID    IMAGE                                                         COMMAND                   CREATED              STATUS    PORTS                     NAMES
5e7d6291a423    docker.io/library/debian:latest                               "sleep 3600"              4 minutes ago        Up                                  restarttest
b579c38e3c55    docker.io/library/debian:latest                               "sleep 123"               6 seconds ago        Up                                  restarttest

apo@oliphant:~ $ sudo ./nerdctl rm restarttest
FATA[0000] 1 errors:
multiple IDs found with provided prefix: restarttest

Note error message above ^

Describe the results you received and expected

This should never happen.

What version of nerdctl are you using?

1.7.6

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

apostasie commented 3 months ago

Actually, I have a reproducer:

while true; do sudo ./nerdctl run -d --restart always --name whatevername debian bash;  done

Will result after some time with:

apo@oliphant:~ $ sudo ./nerdctl ps -a | grep whatevername
016ee92893a2    docker.io/library/debian:latest                               "exit 1"                  About a minute ago    Created                                                   whatevername
2a9140621055    docker.io/library/debian:latest                               "exit 1"                  About a minute ago    Created                                                   whatevername
3b294997c56c    docker.io/library/debian:latest                               "bash"                    51 seconds ago        Restarting (0) 8 seconds ago                              whatevername
5fb34e4cd5a8    docker.io/library/debian:latest                               "bash"                    About a minute ago    Restarting (0) 8 seconds ago                              whatevername
86606352a9af    docker.io/library/debian:latest                               "exit 1"                  About a minute ago    Created                                                   whatevername
b3eebcb75094    docker.io/library/debian:latest                               "exit 1"                  About a minute ago    Created                                                   whatevername
apostasie commented 3 months ago

Maybe the culprit here is the fact that the container almost immediately exits, but has a restart policy?

apostasie commented 3 months ago

@AkihiroSuda this one is really bad

I'll look around and see if I can send a patch.

apostasie commented 3 months ago

I am new here, but this is what I gathered from reading the source: looks like the question is managed in namestore.go with a filesystem based store. Obviously, this is racy, so, I assume either:

  1. there must be a lock mechanism in there that is not doing what it should
  2. or the name is temporarily released for some other reason during the whole exit / restart cycle

I am leaning towards number 2 ^ but any opinion / more informed insight on this would be welcome.

Thanks!

apostasie commented 3 months ago

Confirming there is a serious issue with the name store:

apo@oliphant:~ $ sudo ./nerdctl ps -a | grep whatever
ab08a16d1d74    docker.io/library/debian:latest                               "bash"                    7 minutes ago    Restarting (0) 6 seconds ago                              whatevername
b249f8f569bb    docker.io/library/debian:latest                               "bash"                    7 minutes ago    Restarting (0) 6 seconds ago

apo@oliphant:~ $ sudo ls -lA /var/lib/nerdctl/1935db59/names/default/ | grep whatever <- returns nothing - the file in the name store is gone.

apostasie commented 3 months ago

Surprisingly, namestore.Release does not seem to ever be called - is there another routine altering the name store?

Release is being called somewhere. This seems to be why this is racy. Something Releases the name while it should not.

apostasie commented 3 months ago

@AkihiroSuda I need help.

The problem lies in: https://github.com/containerd/nerdctl/blob/main/pkg/ocihook/ocihook.go#L486

During the whole run/restart cycle, onPostStop is being called, which does release the name. The race condition happens if another Create is being called after that before the first container is restarted and gets to lock the name again.

The reproducer is actually much simpler - just start a failing container with --restart always, and after some time the name file will just disappear from /var/lib/nerdctl/XXX/names/default/.

Problem now is that I am out of my depth here.

Do you have any insight for me here on what to do to fix this?

I am now thinking the entire namestore is a bad idea and we should rather query containerd instead of trying to maintain our own "db".

Thanks in advance.

AkihiroSuda commented 3 months ago

query containerd

This may increase the lookup cost from O(1) to O(n)?

apostasie commented 3 months ago

query containerd

This may increase the lookup cost from O(1) to O(n)?

Probably (unless containerd lookup methods are smarter than just enumerating all containers).

It seems to me the OCI lifecycle events are just not enough for us to maintain a copy of the state on our side. postStop just tells us that a container is done - and not whether it is going to be restarted or not.

Here is my current train of thoughts:

Solution A: keep namestore, and patch ocihook

Pretty much, in onStartContainer, call namestore.Acquire again (I have not verified that containerd will actually trigger these in that case - hopefully it does).

It will not fix the problem - but it will alleviate it by making the racy window much shorter - between postStop and onStart for the first (restart always, failing) container.

Solution B: remove namestore entirely, and query containerd

containerNameStore.Acquire(name id) becomes client.Containers(ctx, fmt.Sprintf("labels.%q==%s", "nerdctl/name", name))

Upside is that we get rid of trying to keep the state on our side (pretty sure there are other ways to fuck it up that this specific issue). Downside is performance. Also, I am not 100% positive this won't still be racy (really depends how containerd implements that).

Thoughts?

AkihiroSuda commented 3 months ago

I guess we can try A first and see if it practically works. We may try B later if A is still too racy.

apostasie commented 3 months ago

Ok. I'll send a PR.

apostasie commented 2 months ago

@AkihiroSuda

Unfortunately, I am seeing this somewhat often while parallelizing tests. We might have to revisit this and consider plan B.