error retrieving temporary directory for container XXXX: no such container

jtudelag commented 6 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

Description

It's constanly happening, and I don't know the exact steps tor reproduce it, It seems Its podman database (boltdb) is not consistent. Deleting the boltdb and re-running podman, fix it.

Here are the error messages I get:

$ sudo podman images                                                                                                                                                                                               
Could not get runtime: error retrieving temporary directory for container 614637a6bd310b7bd41faf515f9e007eb6ddeb6dad5979084f41b61be17482a9: no such container

$ sudo podman ps                                                                                                                                                                                                   
error creating libpod runtime: error retrieving temporary directory for container 614637a6bd310b7bd41faf515f9e007eb6ddeb6dad5979084f41b61be17482a9: no such container

Steps to reproduce the issue:

Don' t know yet, but this is hapenning very frequently.

Describe the results you received:

error retrieving temporary directory for container XXXX: no such container

Additional information you deem important (e.g. issue happens only occasionally):

Deleting the Podman boltdb fix the problem.

Output of podman version:

$ podman version
Version:       0.6.4
Go Version:    go1.10.3
OS/Arch:       linux/amd64

Output of podman info:

host:
  MemFree: 11526627328
  MemTotal: 20436275200
  SwapFree: 10292817920
  SwapTotal: 10292817920
  arch: amd64
  cpus: 4
  hostname: jtudelag-t460s
  kernel: 4.16.14-300.fc28.x86_64
  os: linux
  uptime: 12h 54m 49.96s (Approximately 0.50 days)
insecure registries:
  registries: []
registries:
  registries:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ContainerStore:
    number: 5
  GraphDriverName: overlay
  GraphOptions:
  - overlay.override_kernel_check=true
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
  ImageStore:
    number: 8
  RunRoot: /var/run/containers/storage

O.S Version (Fedora)

cat /etc/*release
Fedora release 28 (Twenty Eight)
NAME=Fedora
VERSION="28 (Workstation Edition)"
ID=fedora
VERSION_ID=28
PLATFORM_ID="platform:f28"
PRETTY_NAME="Fedora 28 (Workstation Edition)"

mheon commented 6 years ago

This looks like an issue with the refresh() code which runs once, the first time after Podman detects a reboot. @jtudelag Does this only happen after reboot? If not, is there anything special about your /var/run? We place a file there to make sure we know that podman has successfully initialized after a reboot. This code should not be running more than once between reboots, and could cause serious bugs if it is.

The actual issue reported seems to be caused by a container being removed from under us in c/storage before refresh() can run. We can handle this in two ways: 1, remove the container in refresh() since it no longer exists in storage; 2, continue even if the container no longer exists, knowing accesses to its root filesystem will fail (removing it with podman rm should still succeed as that is durable against this sort of error). @rhatdan Which would you prefer?

rhatdan commented 6 years ago

So we could trigger this issue if buildah rmi?

mheon commented 6 years ago

Yeah, the likely scenario for this happening is buildah removing storage for a container and then the system restarting.

On Mon, Jul 16, 2018, 09:24 Daniel J Walsh notifications@github.com wrote:

So we could trigger this issue if buildah rmi?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/projectatomic/libpod/issues/1098#issuecomment-405246092, or mute the thread https://github.com/notifications/unsubscribe-auth/AHYHCAyySA9v4vsIYDui5FNvuGJglte3ks5uHJQagaJpZM4VQ2KM .

rhatdan commented 6 years ago

I think we should require the user to rm the container, would not want to have containers magically disappearing.

jtudelag commented 6 years ago

@mheon My FS layout is the Fedora default one. /var/run is a tmpfs FS:

df -h /var/run 
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           9.6G  2.1M  9.6G   1% /run

Also, I don't know yet how to reproduce this yet, I have tried to reboot the laptop while having a container run by podman, and everything was fine.

I have been working a lot recently with podman & buildah, so I might have removed a container image that I shouldn't, maybe, but as I said, I'm hitting this issue very frequently...

mheon commented 6 years ago

It sounds like refresh() is running multiple times during the same boot. This is a serious issue as it completely resets the state of all containers. We need to hit the removed containers issue, but I see the repeated calls to refresh as more of an issue.

On Mon, Jul 16, 2018, 10:17 Jorge Tudela notifications@github.com wrote:

@mheon https://github.com/mheon My FS layout is the Fedora default one. /var/run is a tmpfs FS:

df -h /var/run Filesystem Size Used Avail Use% Mounted on tmpfs 9.6G 2.1M 9.6G 1% /run

Also, I don't know yet how to reproduce this yet, I have tried to reboot the laptop while having a container run by podman, and everything was fine.

I have been working a lot recently with podman & buildah, so I might have removed a container image that I shouldn't, maybe, but as I said, I'm hitting this issue very frequently...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/projectatomic/libpod/issues/1098#issuecomment-405262533, or mute the thread https://github.com/notifications/unsubscribe-auth/AHYHCFxzkjhB-PxinYQg5Q0m5Lx-xhstks5uHKBtgaJpZM4VQ2KM .

jvanz commented 6 years ago

I'm facing the same issue. But I don't how the reproducible steps... =/

mheon commented 6 years ago

Opened #1252 which should resolve the issue.

I'm fairly certain this comes down to refresh() never successfully completing, leaving the state partially configured and unusable. This is a partial solution, preventing the fatal refresh loop but being unable to salvage the containers or pods that error. They should still be possible to remove.

miabbott commented 6 years ago

I ran into this as well.

I was able to build podman from git master (which included #1252) and then run ./podman ps to refresh the list containers. I did get this error at the time, but it was a one time occurrence:

ERRO[0000] Error refreshing container b445ccb00641eec49117f46ea0a8d8b5457600b614f6379bb6ea058ad4f47ab5: error retrieving temporary directory for container b445ccb00641eec49117f46ea0a8d8b5457600b614f6379bb6ea058a
d4f47ab5: no such container

baude commented 5 years ago

Closing as this should be fixed; re-open if needed

containers / podman

error retrieving temporary directory for container XXXX: no such container #1098