Closed JohnStrunk closed 5 years ago
Background
The gluster supervol is mounted once on a node, generating a single fuse mount (and process). A subdirectory within this mount is then mounted onto the designated location via mount --bind
. When the fuse process dies without unmounting, processes that try to access files within either of the two mountpoints get a transport endpoint not connected
error.
Diagnosis
When providing auto_unmount
to the initial supervol mount, it triggers an automatic unmount if the fuse process exits. Unfortunately, it only unmounts the original supervol mount. The bind mount is left in place, with references into the original supervol. This mountpoint continues to see the not connected error.
If the fuse process dies, pods get stuck and won't terminate. The error commonly seen is 'transport endpoint not connected'. The state that the system is left in results in kubelet being confused about the state of the mount.
Investigate (and implement) whether
auto_unmount
will provide better error recovery.