gluster / gluster-subvol

Subdirectories of Gluster volumes as PVs in Kubernetes and OpenShift
Apache License 2.0
11 stars 6 forks source link

Pods stuck in "terminating" state if fuse dies #34

Closed JohnStrunk closed 5 years ago

JohnStrunk commented 5 years ago

If the fuse process dies, pods get stuck and won't terminate. The error commonly seen is 'transport endpoint not connected'. The state that the system is left in results in kubelet being confused about the state of the mount.

Investigate (and implement) whether auto_unmount will provide better error recovery.

JohnStrunk commented 5 years ago

Background The gluster supervol is mounted once on a node, generating a single fuse mount (and process). A subdirectory within this mount is then mounted onto the designated location via mount --bind. When the fuse process dies without unmounting, processes that try to access files within either of the two mountpoints get a transport endpoint not connected error.

Diagnosis When providing auto_unmount to the initial supervol mount, it triggers an automatic unmount if the fuse process exits. Unfortunately, it only unmounts the original supervol mount. The bind mount is left in place, with references into the original supervol. This mountpoint continues to see the not connected error.