Closed 4383 closed 5 years ago
Does the original container actually exists?
Try a podman pod ps
and see if there are any pods with that name/ID.
There could be a race condition here. Where one container is exiting, and running podman cleanup while another container is launching,
I have already try podman ps
and the container doesn't exist in the list...
Just to be sure, did you try podman ps -a
to show all the containers?
Yep see the bug description
I have try the following commands to find an existing container with the same name and no results was founds:
$ sudo podman ps
$ # no results found
$ sudo podman ps -a
$ # no results found
$ sudo podman volume list
$ # no results found and no volumes exists
Volumes don't share names with pods and containers, so podman volume list
doesn't really help.
Can you try podman pod ps
? Pods do share names with containers
@mheon I'm sorry but I'm not sure to can reproduce this issue all the time (a little bit random) and I have already reset my env... If I'm facing it again lets me append my traceback and outputs commands here, especially podman pod ps
If you do manage to reproduce again, and the pod check produces negative, also append /var/lib/containers/storage/libpod/bolt_state.db
There's a small chance we have some sort of state corruption going on, but I would think we would have hit this before if so.
I attempted something this for i in {1..100};do podman run --name dan --rm fedora echo hello; done To see if we could be suffering from a race condition.
But nothing failed.
If I reproduce again I will push all the informations to this issue
FWIW when I was looking at this with Herve I did try a 'podman pod ps' and it returned empty. Hopefully we can collect /var/lib/containers/storage/libpod/bolt_state.db next time we hit this
you can try the better reproducer version from https://github.com/containers/libpod/issues/1656 (just drop the docker'ish part of it)
Well.. I have successfully reproduce the problem...
podman pod ps
is always empty.
I have extract the bolt_state.db like @mbaldessari suggest and it was attached to this comment.
bolt_state.db.zip
I was able to get into this state also, but calling podman rm -f
on a running container.
The containers seem to be remaining in c/storage, preventing us from creating new containers with the same names.
Part of the problem seems to be c/storage not being durable enough under stress - it seems to start failing to delete containers a lot sooner than the rest of Podman. When it does, we still get rid of as much of the container as we can (rather than leave a half-configured container around), but the lingering c/storage container conflicts with new containers with the same name. We should look into why c/storage is failing here.
I'm not sure if we have a good option for deleting the lingering storage containers... For all we know, they're valid buildah
or CRI-O
containers (we'll know they don't belong to CRI-O
soon enough, but there are no plans to put buildah
on libpod), so we can't safely delete them.
@mheon interesting analyze
I think we should add something to rm --removestorage, which would ignore the error from libpod saying the container does not exist, and remove the storage.
@rhatdan Should we just recommend they use buildah rm
to get rid of it? I'd almost prefer that to adding potentially confusing options to podman rm
we call Buildah for building stuff, could we call rm too in error cases as a final hammer rather than telling the user to?
The problem is, we've already called into c/storage in this case, and it's failed - I don't know if hammering it more by calling it again through Buildah would help...
(We really ought to just drill into why c/storage is failing in these cases - it seems like making it more stable would be beneficial for all our tools)
Are we sure that an error happened? or was this a race condition.
@rhatdan Do you still have your reproducer? I'm expecting that we're getting errors out of c/storage, and we'd be printing them in that case
I am not crazy about requiring buildah to be installed to get us out of a state where the container image was accidently left around.
If I do a podman rm --force foobar, The user would expect the container to be removed and then be able to do podman run --name foobar.
We can add documentation to podman rm --force foobar indicating that this will remove not only podman containers named foobar but could remove containers created by other tools.
I have merged in a fix for podman rm --force that will remove a container that libpod does not know about.
This will get you our of this situation.
Buildah @mheon @rhatdan
No it didn't help:
[root@overcloud-controller-0 ~]# buildah rm container-puppet-horizon
error removing container "container-puppet-horizon": error reading build container: error reading "/var/lib/containers/storage/overlay-containers/4f6edfeabb7993024c78c50ae791aaa66c0d819c3e57e47d927bd71fc1657b40/userdata/buildah.json": open /var/lib/containers/storage/overlay-containers/4f6edfeabb7993024c78c50ae791aaa66c0d819c3e57e47d927bd71fc1657b40/userdata/buildah.json: no such file or directory
Buildah 1.5 (RHEL8)
I think we need to re-open this one, we manage to reproduce it with latest podman:
[root@overcloud-controller-0 ~]# podman run --rm -ti --name container-puppet-horizon my-registry:8888/rhosp15/openstack-horizon:latest bash
Error: error creating container storage: the container name "container-puppet-horizon" is already in use by "4f6edfeabb7993024c78c50ae791aaa66c0d819c3e57e47d927bd71fc1657b40". You have to remove that container to be able to reuse that name.: that name is already in use
[root@overcloud-controller-0 ~]# podman --log-level=debug run --rm -ti --name container-puppet-horizon my-registry:8888/rhosp15/openstack-horizon:latest bash
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Using volume path /var/lib/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "overlay"
DEBU[0000] overlay test mount with multiple lowers succeeded
DEBU[0000] overlay test mount indicated that metacopy is not being used
DEBU[0000] backingFs=xfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false
WARN[0000] Error loading CNI config list file /etc/cni/net.d/87-podman-bridge.conflist: error parsing configuration list: unexpected end of JSON input
DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]my-registry:8888/rhosp15/openstack-horizon:latest"
DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]@381746125bf6715a1722e148aba07ff69d40a75c661a20182b88e46f0b8b0642"
DEBU[0000] exporting opaque data as blob "sha256:381746125bf6715a1722e148aba07ff69d40a75c661a20182b88e46f0b8b0642"
DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]@381746125bf6715a1722e148aba07ff69d40a75c661a20182b88e46f0b8b0642"
DEBU[0000] exporting opaque data as blob "sha256:381746125bf6715a1722e148aba07ff69d40a75c661a20182b88e46f0b8b0642"
DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]@381746125bf6715a1722e148aba07ff69d40a75c661a20182b88e46f0b8b0642"
DEBU[0000] Using bridge netmode
DEBU[0000] appending name container-puppet-horizon
DEBU[0000] Allocated lock 2 for container 5f6d48335a5ee864520feb4db0e2116f818323cf2c9f77d782f17b8d108ca83d
DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]@381746125bf6715a1722e148aba07ff69d40a75c661a20182b88e46f0b8b0642"
DEBU[0000] exporting opaque data as blob "sha256:381746125bf6715a1722e148aba07ff69d40a75c661a20182b88e46f0b8b0642"
DEBU[0000] failed to create container container-puppet-horizon(5f6d48335a5ee864520feb4db0e2116f818323cf2c9f77d782f17b8d108ca83d): the container name "container-puppet-horizon" is already in use by "4f6edfeabb7993024c78c50ae791aaa66c0d819c3e57e47d927bd71fc1657b40". You have to remove that container to be able to reuse that name.: that name is already in use
ERRO[0000] error creating container storage: the container name "container-puppet-horizon" is already in use by "4f6edfeabb7993024c78c50ae791aaa66c0d819c3e57e47d927bd71fc1657b40". You have to remove that container to be able to reuse that name.: that name is already in use
[root@overcloud-controller-0 ~]# rpm -qa|grep podman
podman-1.2.0-1.git3bd528e.module+el8+2977+701c9eaf.x86_64
The latest podman should get you out of it, IE podman rm -f CID should remove the container even if it is not known in podman's database.
The latest podman should get you out of it, IE podman rm -f CID should remove the container even if it is not known in podman's database.
I tested with podman-1.2.0-1.git3bd528e.module+el8+2977+701c9eaf.x86_64 as you can see in my previous comment and it didn't work. This rpm has your patch iiuc.
podman rm didn't work, you can ask @mheon, he saw it while we were debugging.
The issue here seems to be some Podman command between the container being started, and the container being removed, is run in a container without /var/run
from the host mounted (or, otherwise, missing the /var/run/libpod/alive
file we use to check to see if the system has restarted, plus whatever c/storage uses for the same thing). This causes us to lose track of container status - whether it's been mounted, how many times, etc. When we attempt to remove the container, it's still mounted, but c/storage doesn't know this (it lost the mount counter, I believe?), so we get a failure as it's still in use.
I can partially work around this on the Podman side by making our refresh code smarter (but slower) and actually querying c/storage and runc to see what the container is doing at the moment.
However, I can't fix c/storage losing the mount counter because /var/run was changed, so I can't directly fix this on the Podman side.
Also, unfortunately, buildah rm
no longer works on containers without a buildah.conf
.
This means we no longer have a way of working directly with c/storage containers that get orphaned.
We should fix it, so that it also removes container images when told to --force.
I don't see how that will help here?
I would give you a way to cleanup
I don't think that helps us? The issue here is that we don't know the container is mounted, so it's not unmounted, so attempting to remove storage doesn't work.
@mheon whats the latest on this?
This probably overlaps with the work we were talking about to show c/storage containers in podman ps
with a flag, and allow removal with podman rm
We've dealt with these via podman rm --storage
on upstream (though there are plans to add a podman ps --storage
to show all containers in c/storage as well)
This is not fixed and podman rm --storage
doesn't work for me either.
E.g.
$ podman run --pod foo --name foo-postgres -d postgres:9.6
Error: error creating container storage: the container name "mf-postgres" is already in use by "04dfc7232d5bda23990c441c825ec56d138c1ff87f34082134c23bb8fd887324". You have to remove that container to be able to reuse that name.: that name is already in use
$ podman rm -f --storage foo-postgres
foo-postgres
Error: error removing storage for container "foo-postgres": unlinkat /home/greg/.local/share/containers/storage/overlay/c4b778bbff10d826fe1c837b0147e8e51b9e539eda76a0dc34a9633dedcedad9/merged: device or resource busy
Something is likely mounted at that directory (specifically, it seems like fuse-overlayfs
failed to cleanly unmount). You might want to try unmounting it in a podman unshare
shell, then removing once that's done.
Why should I care about these things as a regular user? Removing a container should work without any crazy workarounds and hacking around bugs. Especially because podman wants to be a drop-in replacement of Docker.
We are aware of this issue, and I understand that it sucks - this is definitely something that Podman should be handling automatically. There's an issue somewhere in containers/storage where containers can be registered as successfully unmounted despite the unmount failing, so our tools don't know they have to unmount on trying to remove. Thus far, this has been a very rare occurrence, so hopefully you won't have to worry about this again. If you can consistently reproduce, though, we'd love to have your help tracking this one down - it's very difficult to figure out what's going wrong when we can't manage to reproduce the issue ourselves.
Is this issue I'm having related to this? I am trying to create a container with the same name that is already removed and failing:
Error: error creating container storage: the container name "mc_guacgui" is already in use by "18f2f24865aa7ba60d5eafd4eef55a49c987ee487b7890f6aa2c5849432a8fa4". You have to remove that container to be able to reuse that name.: that name is already in use
[bryan@fedora-laptop]~/containers/guac_mc$ podman rm -f 18f2f24865aa7ba60d5eafd4eef55a49c987ee487b7890f6aa2c5849432a8fa4
Error: Failed to evict container: "": Failed to find container "18f2f24865aa7ba60d5eafd4eef55a49c987ee487b7890f6aa2c5849432a8fa4" in state: no container with name or ID 18f2f24865aa7ba60d5eafd4eef55a49c987ee487b7890f6aa2c5849432a8fa4 found: no such container
How can I fix this?
Try podman rm --force --storage 18f2f24865aa7ba60d5eafd4eef55a49c987ee487b7890f6aa2c5849432a8fa4
and see if that works.
Thanks, that got me past that point but now it looks like it resulted in some sort of permissions problem:
Error: creating file '/home/bryan/.local/share/containers/storage/overlay/87b107153a17c9044c38656eed59f8273f85c01fad9051f599f798d6005ae057/merged/run/secrets': Permission denied: OCI runtime permission denied error
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
A script launch the following command to start a container with the
rm
flag so the contianer will be destroyed at exit but when I try to recreate a container manually with the same podman command, podman fail to create the container and display the following error:When I try to inspect for an existing volume or something like that I doesn't found any results:
Look like similar to #1359
Steps to reproduce the issue:
podman run --rm
command twicesDescribe the results you received:
Describe the results you expected:
I'm waiting for a container creation
Additional information you deem important (e.g. issue happens only occasionally):
Output of
podman version
:Output of
podman info
:Additional environment details (AWS, VirtualBox, physical, etc.): KVM