containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.85k stars 2.42k forks source link

podman not gracefully cleaning up attach socket and fifo ctl #3436

Closed space88man closed 5 years ago

space88man commented 5 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description Three containers sharing the network namespace of the first container.

When the containers are shutdown gracefully, i.e, podman stop con1 con2 con2

the attach socket and ctl fifo are left behind.

Steps to reproduce the issue:

podman start con1
podman start con2 con3
#  con2 and con3 are using the network namespace of con1
podman stop con1 con2 con3
systemctl reboot #reboot the host

2.

podman start con1
Failed to bind attach socket: /var/run/libpod/socket/75a62e9f958cd36a99148d074ec408d9
Failed to mkfifo /var/lib/containers/storage/overlay-containers/75a62e9f958cd36a99148d074ec408d9cc

3.

Describe the results you received: Cannot start container due to socket, fifo left behind

Describe the results you expected: Container starts

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

podman version
Version:            1.4.2
RemoteAPI Version:  1
Go Version:         go1.12.5
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.12.5
  podman version: 1.4.2
host:
  BuildahVersion: 1.9.0
  Conmon:
    package: podman-1.4.2-1.fc30.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 0.2.0, commit: d7234dc01ae2ef08c42e3591e876723ad1c914c9'
  Distribution:
    distribution: fedora
    version: "30"
  MemFree: 59941171200
  MemTotal: 67525967872
  OCIRuntime:
    package: runc-1.0.0-93.dev.gitb9b6cc6.fc30.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8+dev
      commit: e3b4c1108f7d1bf0d09ab612ea09927d9b59b4e3
      spec: 1.0.1-dev
  SwapFree: 0
  SwapTotal: 0
  arch: amd64
  cpus: 16
  hostname: podman.dev.localhost
  kernel: 5.1.12-300.fc30.x86_64
  os: linux
  rootless: false
  uptime: 10m 23.61s
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 16
  GraphDriverName: overlay
  GraphOptions:
  - overlay.mountopt=nodev,metacopy=on
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  ImageStore:
    number: 20
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.): physical

mheon commented 5 years ago

Seems like cleanup process might not be firing. Running with --log-level=debug and --syslog might reveal issues.

On Tue, Jun 25, 2019, 23:56 space88man notifications@github.com wrote:

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description Three containers sharing the network namespace of the first container.

When the containers are shutdown gracefully, i.e, podman stop con1 con2 con2

the attach socket and ctl fifo are left behind.

Steps to reproduce the issue:

1.

podman start con1 podman start con2 con3

con2 and con3 are using the network namespace of con1

podman stop con1 con2 con3 systemctl reboot #reboot the host

1.

podman start con1 Failed to bind attach socket: /var/run/libpod/socket/75a62e9f958cd36a99148d074ec408d9 Failed to mkfifo /var/lib/containers/storage/overlay-containers/75a62e9f958cd36a99148d074ec408d9cc

1.

Describe the results you received: Cannot start container due to socket, fifo left behind

Describe the results you expected: Container starts

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

podman version Version: 1.4.2 RemoteAPI Version: 1 Go Version: go1.12.5 OS/Arch: linux/amd64

Output of podman info --debug:

debug: compiler: gc git commit: "" go version: go1.12.5 podman version: 1.4.2 host: BuildahVersion: 1.9.0 Conmon: package: podman-1.4.2-1.fc30.x86_64 path: /usr/libexec/podman/conmon version: 'conmon version 0.2.0, commit: d7234dc01ae2ef08c42e3591e876723ad1c914c9' Distribution: distribution: fedora version: "30" MemFree: 59941171200 MemTotal: 67525967872 OCIRuntime: package: runc-1.0.0-93.dev.gitb9b6cc6.fc30.x86_64 path: /usr/bin/runc version: |- runc version 1.0.0-rc8+dev commit: e3b4c1108f7d1bf0d09ab612ea09927d9b59b4e3 spec: 1.0.1-dev SwapFree: 0 SwapTotal: 0 arch: amd64 cpus: 16 hostname: podman.dev.localhost kernel: 5.1.12-300.fc30.x86_64 os: linux rootless: false uptime: 10m 23.61s registries: blocked: null insecure: null search:

  • docker.io
  • registry.fedoraproject.org
  • quay.io
  • registry.access.redhat.com
  • registry.centos.org store: ConfigFile: /etc/containers/storage.conf ContainerStore: number: 16 GraphDriverName: overlay GraphOptions:
  • overlay.mountopt=nodev,metacopy=on GraphRoot: /var/lib/containers/storage GraphStatus: Backing Filesystem: btrfs Native Overlay Diff: "false" Supports d_type: "true" Using metacopy: "true" ImageStore: number: 20 RunRoot: /var/run/containers/storage VolumePath: /var/lib/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.): physical

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/containers/libpod/issues/3436?email_source=notifications&email_token=AB3AOCAPAPF3B5WUV2CTPFLP4LSHVA5CNFSM4H3OHAGKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G3WOFZA, or mute the thread https://github.com/notifications/unsubscribe-auth/AB3AOCAIO3R66P2XHHD4DU3P4LSHVANCNFSM4H3OHAGA .

giuseppe commented 5 years ago

@mheon could it be the same issue we have seen recently where it was needed to run "podman system renumber"?

@space88man could you try "podman system renumber" before attempting to start the containers?

space88man commented 5 years ago
# podman ps
CONTAINER ID  IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES
# podman system renumber
Error: Error shutting down container storage: A layer is mounted: layer is in use by a container

Nevertheless, my triplet container set is able to start up.

baude commented 5 years ago

@mheon what are your thoughts here.

mheon commented 5 years ago

This is almost certainly the system renumber issue we encountered with toolbox - improper locks management in 1.1.x - 1.3.1 revealing itself after an upgrade to 1.3.2 or higher. Running podman system renumber (ideally while no containers are running - and it may be a good idea to reboot the system afterwards, or run immediately after a fresh boot) will resolve the issue