coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
146 stars 30 forks source link

Orphaned pods due to leftover volume mount directories #2082

Open cehoffman opened 7 years ago

cehoffman commented 7 years ago

Issue Report

Bug

kubelet-wrapper leaves behind orphaned pods due to no longer mounted volumes. This is related to #1831 I believe. It results in logs like.

Orphaned pod "351b9041-7531-11e7-a72a-000d3a03c356" found, but volume paths are still present on disk.

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1353.8.0
VERSION_ID=1353.8.0
BUILD_ID=2017-05-30-2322
PRETTY_NAME="Container Linux by CoreOS 1353.8.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"

Can not upgrade to latest stable due to coreos/tectonic-installer#1171 Tested on 1409.7.0 with same problem.

Environment

This is in Azure public cloud.

Expected Behavior

Orphaned pods are not left behind due to empty volume folders from unmounted volumes.

Actual Behavior

Orphaned pods are left behind.

Reproduction Steps

  1. Create a kubernetes cluster with Azure cloud provider integration from tectonic-installer
  2. Create a default storage class for azure disk provisioning
kind: StorageClass
metadata:
  name: default
provisioner: kubernetes.io/azure-disk
  1. Create PersistenVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: orphan-test
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  1. Create a pod that uses the volume
  2. Delete the Pod
  3. Watch kubelet.service logs to see orphaned pods

Other Information

This same behavior also happens with volumes created from Portworx volumes. Since Azure disks suck for actual usage in kubernetes right now, ensuring the fix works for Portworx volumes is of higher priority.

pheuter commented 6 years ago

Seeing this issue come up now on Tectonic 1.8.4-tectonic.3 running Kubernetes v1.8.4+coreos.0 and Docker version 17.03.2-ce, build 2360430 on the Azure cloud using managed azure disk provisioning, used to work before and now is big blocker for our production deployments.