container-storage-interface / spec

Container Storage Interface (CSI) Specification.
Apache License 2.0
1.34k stars 372 forks source link

Is it mandatory to 'stage' a volume on 'staging target path' provided by CO ? #385

Open humblec opened 5 years ago

humblec commented 5 years ago
// The path to which the volume MAY be staged. It MUST be an
  // absolute path in the root filesystem of the process serving this
  // request, and MUST be a directory. The CO SHALL ensure that there
  // is only one `staging_target_path` per volume. The CO SHALL ensure
  // that the path is directory and that the process serving the
  // request has `read` and `write` permission to that directory. The
  // CO SHALL be responsible for creating the directory if it does not
  // exist.
  // This is a REQUIRED field.
  string staging_target_path = 3;

The Staging target path is carved out by CO and passed to SP. From the spec it is not clear that, whether the SP has to stage a volume on this target path only.

Is it mandatory ?

If it is not mandatory, I see there are couple of issues.

1) The SP dont have a mechanism to pass the new target path to CO via NodeStageVolumeResponse.

2) The cleanup operation of stage volume actually happens on the path provided by CO ( ex: Kubernetes) . Here I am referring to removeMountDir code in kubernetes.


func removeMountDir(plug *csiPlugin, mountPath string) error {
    klog.V(4).Info(log("removing mount path [%s]", mountPath))

    ....
    if !mnt {
        klog.V(4).Info(log("dir not mounted, deleting it [%s]", mountPath))
        if err := os.Remove(mountPath); err != nil && !os.IsNotExist(err) {
            return errors.New(log("failed to remove dir [%s]: %v", mountPath, err))
        }

        ....

Any thoughts ?

humblec commented 5 years ago

@msau42 @jsafrane

Madhu-1 commented 5 years ago

if it becomes mandatory as staging path is a directory how about block volumes?

msau42 commented 5 years ago

the CO will pass around the staging_target_directory from NodeStageVolume to NodePublishVolume. As a SP, you stage your volume however you like inside the staging_target_directory. For example, you can stage it directly at the starging_target_directory, or a subdirectory or file underneath. You can also store other files/metadata inside the staging_target_directory if you like.

gnufied commented 5 years ago

There is a interesting consequence of SP storing metadata under staging target_dir, I am noticing that if stage fails then k8s tries to remove the staging directory - https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/csi/csi_attacher.go#L330 but if it had files, that would fail. It is somewhat messy.. :(

humblec commented 5 years ago

the CO will pass around the staging_target_directory from NodeStageVolume to NodePublishVolume. As a SP, you stage your volume however you like inside the staging_target_directory. For example, you can stage it directly at the starging_target_directory, or a subdirectory or file underneath. You can also store other files/metadata inside the staging_target_directory if you like.

@msau42 thanks, but as mentioned in the problem description , if SP put a directory or file inside the staging target path directory, the removeMountDir fail which got triggered because of a timeout or similar condition, the main reason being it uses if err := os.Remove(mountPath); in short os.Remove . May be os.removeAll could have helped here. But at present it fails with an error message similar to

Aug 26 15:48:10 ip-10-0-129-7 systemd[1]: Started Kubernetes transient mount for /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d18e08c9-c818-11e9-b2cc-0a7451fe601c/globalmount/0001-0009-rook-ceph-0000000000000001-d23fa5e3-c818-11e9-be8f-0a580a830010.

Aug 26 15:48:21 ip-10-0-129-7 hyperkube[1189]: E0826 15:48:21.868121    1189 csi_mounter.go:433] kubernetes.io/csi: failed to remove dir [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d18e08c9-c818-11e9-b2cc-0a7451fe601c/globalmount]: remove /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d18e08c9-c818-11e9-b2cc-0a7451fe601c/globalmount: directory not empty

Aug 26 15:48:21 ip-10-0-129-7 hyperkube[1189]: E0826 15:48:21.868142    1189 csi_attacher.go:322] kubernetes.io/csi: attacher.MountDevice failed to remove mount dir after errir [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d18e08c9-c818-11e9-b2cc-0a7451fe601c/globalmount]: remove /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-d18e08c9-c818-11e9-b2cc-0a7451fe601c/globalmount: directory not empty

Afaict, all the code paths in attacher basically make use of staging target path (.../globalmount ) for its operations and not really the actual staged path inside the staging target path for its operations like cleanup..etc in case there was a failure while staging . This looks to be problematic .

One other issue I noticed here is this : In this case ie when timeout of 2 mins occurs while staging was in progress, CO start cleanup process which can fail due to an error like directory not empty,. At this time, CO think that the highlevel operation ie staging itself failed for the workload and it put the workload to a new NODE even if the POD has RWO volume in it, which cause double staging ( both in old and new node at the same time) to happen for a workload thus Filesystem data corruption..etc.

msau42 commented 5 years ago

Are you saying there is a problem with the normal Stage->Publish->Unpublish->Unstage flow? Or there's only a problem when Stage fails? I think it should be up to the SP to make sure to undo/cleanup when they abort an operation

humblec commented 5 years ago

@msau42 in short, https://github.com/kubernetes/kubernetes/issues/82190 covers the issue I was mentioning.

jdef commented 5 years ago

Is there actually a spec issue here, or is this some k8s/CSI implementation issue?