containerd / nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...
Apache License 2.0
8.24k stars 613 forks source link

Failed to execute nerdctl commit commands at the same time #1391

Closed messiah10 closed 2 months ago

messiah10 commented 2 years ago

k8s 1.21.9 containerd 1.3.7 nerdctl 0.18.0

If two commit command are performed at the same time, one succeeds but the other one fails with a digest already exists error message.

Committed different pods created with the same base image, and assigned different image names and tag.

time="2022-08-29T16:40:14+09:00" level=warning msg="Image lacks label \"nerdctl/platform\", assuming the platform to be \"linux/amd64\"" time="2022-08-29T16:40:14+09:00" level=fatal msg="failed to export layer: snapshot \"sha256:738a919f97cf7edfe7a17bbc6058ab8a79539400e6f6c084ddd4c0a2829bbfb3-parent-view\": already exists"

2000yeshu commented 2 years ago

There is another use-case where we see the same error which happens when we commit a contianer and the commit is interrupted and when we retry the commit we see the error failed to export layer: snapshot \\"sha256:5ca359d74c4d65ad7dc2fc1013b3cccfa921f9000395ac846fd06a37c9a1a67e-parent-view\\": already exists. I think this happens because there is no garbage collection of the bolt KVs and the error is thrown here in containerd's bolt module

2000yeshu commented 2 years ago

I think if we listen for any interrupt or kill signals and gc any useless KVs when any commit is interrupted before completion, this error can be prevented

Belyenochi commented 2 years ago

I had the same problem, looks like concurrent commit image deadlock.

mattjwarren commented 1 year ago

We are expecting to have commits happening at the same time as a very common use case ; does this problem always occur when two commits run at the same time on a give system?

apostasie commented 2 months ago

Confirming this is still a problem, albeit with different symptoms.

Repro:

nerdctl pull debian

nerdctl rm -f foo 2>&1 > /dev/null
nerdctl run --name foo -d debian sleep Inf

while true; do
  nerdctl commit foo bar &
  nerdctl commit foo bar2 &
  sleep 1
done

Output after a few runs.

FATA[0000] failed to pause container: cannot pause a paused container: unknown
apostasie commented 2 months ago

Reading code, none of this is safe wrt concurrency. We need a lock mechanism to prevent concurrent operations like commit.