Closed messiah10 closed 2 months ago
There is another use-case where we see the same error which happens when we commit a contianer and the commit is interrupted and when we retry the commit we see the error failed to export layer: snapshot \\"sha256:5ca359d74c4d65ad7dc2fc1013b3cccfa921f9000395ac846fd06a37c9a1a67e-parent-view\\": already exists
. I think this happens because there is no garbage collection of the bolt KVs and the error is thrown here in containerd's bolt module
I think if we listen for any interrupt or kill signals and gc any useless KVs when any commit is interrupted before completion, this error can be prevented
I had the same problem, looks like concurrent commit image deadlock.
We are expecting to have commits happening at the same time as a very common use case ; does this problem always occur when two commits run at the same time on a give system?
Confirming this is still a problem, albeit with different symptoms.
Repro:
nerdctl pull debian
nerdctl rm -f foo 2>&1 > /dev/null
nerdctl run --name foo -d debian sleep Inf
while true; do
nerdctl commit foo bar &
nerdctl commit foo bar2 &
sleep 1
done
Output after a few runs.
FATA[0000] failed to pause container: cannot pause a paused container: unknown
Reading code, none of this is safe wrt concurrency. We need a lock mechanism to prevent concurrent operations like commit.
k8s 1.21.9 containerd 1.3.7 nerdctl 0.18.0
If two commit command are performed at the same time, one succeeds but the other one fails with a digest already exists error message.
Committed different pods created with the same base image, and assigned different image names and tag.
time="2022-08-29T16:40:14+09:00" level=warning msg="Image lacks label \"nerdctl/platform\", assuming the platform to be \"linux/amd64\"" time="2022-08-29T16:40:14+09:00" level=fatal msg="failed to export layer: snapshot \"sha256:738a919f97cf7edfe7a17bbc6058ab8a79539400e6f6c084ddd4c0a2829bbfb3-parent-view\": already exists"