Open humblec opened 5 years ago
@msau42 @jsafrane
if it becomes mandatory as staging path is a directory how about block volumes?
the CO will pass around the staging_target_directory from NodeStageVolume to NodePublishVolume. As a SP, you stage your volume however you like inside the staging_target_directory. For example, you can stage it directly at the starging_target_directory, or a subdirectory or file underneath. You can also store other files/metadata inside the staging_target_directory if you like.
There is a interesting consequence of SP storing metadata under staging target_dir, I am noticing that if stage fails then k8s tries to remove the staging directory - https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/csi/csi_attacher.go#L330 but if it had files, that would fail. It is somewhat messy.. :(
the CO will pass around the staging_target_directory from NodeStageVolume to NodePublishVolume. As a SP, you stage your volume however you like inside the staging_target_directory. For example, you can stage it directly at the starging_target_directory, or a subdirectory or file underneath. You can also store other files/metadata inside the staging_target_directory if you like.
@msau42 thanks, but as mentioned in the problem description , if SP put a directory or file inside the staging target path directory, the removeMountDir
fail which got triggered because of a timeout or similar condition, the main reason being it uses if err := os.Remove(mountPath);
in short os.Remove
. May be os.removeAll
could have helped here. But at present it fails with an error message similar to
Aug 26 15:48:10 ip-10-0-129-7 systemd[1]: Started Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d18e08c9-c818-11e9-b2cc-0a7451fe601c/globalmount/0001-0009-rook-ceph-0000000000000001-d23fa5e3-c818-11e9-be8f-0a580a830010.
Aug 26 15:48:21 ip-10-0-129-7 hyperkube[1189]: E0826 15:48:21.868121 1189 csi_mounter.go:433] kubernetes.io/csi: failed to remove dir [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d18e08c9-c818-11e9-b2cc-0a7451fe601c/globalmount]: remove /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d18e08c9-c818-11e9-b2cc-0a7451fe601c/globalmount: directory not empty
Aug 26 15:48:21 ip-10-0-129-7 hyperkube[1189]: E0826 15:48:21.868142 1189 csi_attacher.go:322] kubernetes.io/csi: attacher.MountDevice failed to remove mount dir after errir [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d18e08c9-c818-11e9-b2cc-0a7451fe601c/globalmount]: remove /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d18e08c9-c818-11e9-b2cc-0a7451fe601c/globalmount: directory not empty
Afaict, all the code paths in attacher
basically make use of staging target path
(.../globalmount
) for its operations and not really the actual staged
path inside the staging target path
for its operations like cleanup..etc in case there was a failure while staging . This looks to be problematic .
One other issue I noticed here is this : In this case ie when timeout of 2 mins occurs while staging was in progress, CO start cleanup process which can fail due to an error like directory not empty
,. At this time, CO think that the highlevel
operation ie staging
itself failed for the workload and it put the workload to a new NODE even if the POD has RWO
volume in it, which cause double staging
( both in old and new node at the same time) to happen for a workload thus Filesystem data corruption..etc.
Are you saying there is a problem with the normal Stage->Publish->Unpublish->Unstage flow? Or there's only a problem when Stage fails? I think it should be up to the SP to make sure to undo/cleanup when they abort an operation
@msau42 in short, https://github.com/kubernetes/kubernetes/issues/82190 covers the issue I was mentioning.
Is there actually a spec issue here, or is this some k8s/CSI implementation issue?
The Staging target path is carved out by CO and passed to SP. From the spec it is not clear that, whether the SP has to stage a volume on this target path only.
Is it mandatory ?
If it is not mandatory, I see there are couple of issues.
1) The SP dont have a mechanism to pass the
new
target path to CO viaNodeStageVolumeResponse
.2) The cleanup operation of stage volume actually happens on the path provided by CO ( ex: Kubernetes) . Here I am referring to
removeMountDir
code in kubernetes.Any thoughts ?