Closed jtudelag closed 5 years ago
This looks like an issue with the refresh()
code which runs once, the first time after Podman detects a reboot. @jtudelag Does this only happen after reboot? If not, is there anything special about your /var/run
? We place a file there to make sure we know that podman
has successfully initialized after a reboot. This code should not be running more than once between reboots, and could cause serious bugs if it is.
The actual issue reported seems to be caused by a container being removed from under us in c/storage before refresh()
can run. We can handle this in two ways: 1, remove the container in refresh()
since it no longer exists in storage; 2, continue even if the container no longer exists, knowing accesses to its root filesystem will fail (removing it with podman rm
should still succeed as that is durable against this sort of error). @rhatdan Which would you prefer?
So we could trigger this issue if buildah rmi?
Yeah, the likely scenario for this happening is buildah removing storage for a container and then the system restarting.
On Mon, Jul 16, 2018, 09:24 Daniel J Walsh notifications@github.com wrote:
So we could trigger this issue if buildah rmi?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/projectatomic/libpod/issues/1098#issuecomment-405246092, or mute the thread https://github.com/notifications/unsubscribe-auth/AHYHCAyySA9v4vsIYDui5FNvuGJglte3ks5uHJQagaJpZM4VQ2KM .
I think we should require the user to rm the container, would not want to have containers magically disappearing.
@mheon My FS layout is the Fedora default one. /var/run is a tmpfs FS:
df -h /var/run
Filesystem Size Used Avail Use% Mounted on
tmpfs 9.6G 2.1M 9.6G 1% /run
Also, I don't know yet how to reproduce this yet, I have tried to reboot the laptop while having a container run by podman, and everything was fine.
I have been working a lot recently with podman & buildah, so I might have removed a container image that I shouldn't, maybe, but as I said, I'm hitting this issue very frequently...
It sounds like refresh() is running multiple times during the same boot. This is a serious issue as it completely resets the state of all containers. We need to hit the removed containers issue, but I see the repeated calls to refresh as more of an issue.
On Mon, Jul 16, 2018, 10:17 Jorge Tudela notifications@github.com wrote:
@mheon https://github.com/mheon My FS layout is the Fedora default one. /var/run is a tmpfs FS:
df -h /var/run Filesystem Size Used Avail Use% Mounted on tmpfs 9.6G 2.1M 9.6G 1% /run
Also, I don't know yet how to reproduce this yet, I have tried to reboot the laptop while having a container run by podman, and everything was fine.
I have been working a lot recently with podman & buildah, so I might have removed a container image that I shouldn't, maybe, but as I said, I'm hitting this issue very frequently...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/projectatomic/libpod/issues/1098#issuecomment-405262533, or mute the thread https://github.com/notifications/unsubscribe-auth/AHYHCFxzkjhB-PxinYQg5Q0m5Lx-xhstks5uHKBtgaJpZM4VQ2KM .
I'm facing the same issue. But I don't how the reproducible steps... =/
Opened #1252 which should resolve the issue.
I'm fairly certain this comes down to refresh() never successfully completing, leaving the state partially configured and unusable. This is a partial solution, preventing the fatal refresh loop but being unable to salvage the containers or pods that error. They should still be possible to remove.
I ran into this as well.
I was able to build podman
from git master (which included #1252) and then run ./podman ps
to refresh the list containers. I did get this error at the time, but it was a one time occurrence:
ERRO[0000] Error refreshing container b445ccb00641eec49117f46ea0a8d8b5457600b614f6379bb6ea058ad4f47ab5: error retrieving temporary directory for container b445ccb00641eec49117f46ea0a8d8b5457600b614f6379bb6ea058a
d4f47ab5: no such container
Closing as this should be fixed; re-open if needed
Is this a BUG REPORT or FEATURE REQUEST?:
Description
It's constanly happening, and I don't know the exact steps tor reproduce it, It seems Its podman database (boltdb) is not consistent. Deleting the boltdb and re-running podman, fix it.
Here are the error messages I get:
Steps to reproduce the issue:
Don' t know yet, but this is hapenning very frequently.
Describe the results you received:
error retrieving temporary directory for container XXXX: no such container
Additional information you deem important (e.g. issue happens only occasionally):
Deleting the Podman boltdb fix the problem.
Output of
podman version
:Output of
podman info
:O.S Version (Fedora)