NodeStage/NodeUnstage and NodePublish/NodeUnpublish being called concurrently

gnufied commented 5 years ago

We have come across an issue where the CSI spec does not offer enough clarification about what happens if NodeUnstageVolume is called while NodeStageVolume is in-progress for same volume and similarly for NodePublishVolume and NodeUnpublishVolume.

Lets say - user schedules a workload to node A, but NodeStageVolume may take time and before it has chance to finish, the workload may get evicted from node A. Now two things can happen to the volume that was staged on node A:

NodeStageVolume may just be taking time and CO can wait for it to finish successfully before calling NodeUnstageVolume. The spec currently says:

This RPC SHALL be called by the CO once for each staging_target_path that was successfully setup via NodeStageVolume.

The second problem is - NodeStageVolume may never succeed (because of error or topology constraints) and CO might keep retrying but is CO allowed to make NodeUnStageVolume call while NodeStageVolume may be in-progress?

I think we need to codify this in a better way.

gnufied commented 5 years ago

cc @jingxu97 @jsafrane @msau42

jsafrane commented 5 years ago

IMO this is covered in "Timeouts" chapter: https://github.com/container-storage-interface/spec/blob/master/spec.md#timeouts. It IMO applies not only to timeouts, but also to similar errors like interrupted gRPC connections, where the caller cannot be sure how the call ended and must either retry (in most cases) or cancel (when the volume does not need to be staged/published any longer).

gnufied commented 5 years ago

There is lot of gray areas in:

In some cases, a CO MAY NOT be able to cancel a pending operation because it depends on the result of the pending operation in order to execute the "negation" call.

and it might be best for specs to be clearer in this aspect.

But I think for now we can work with the assumption that NodeUnstage can be issued to cancel a previously in-progress NodeStage and similarly NodeUnpublish can be issued to cancel a previously in-progress NodePublish. Although exact semantic of whether an operation can be cancelled or not depends on what SP does in NodeStage and NodePublish calls.

pure-yesmat commented 5 years ago

I agree that this is very unclear. The real issue arises when we cannot "cancel" the NodeStageVolume/NodePublishVolume requests, what is the correct thing to return? Can the SP return Pending(Aborted) in the case that NodeUnstageVolume/NodeUnpublishVolume cannot cancel the request? It is not super clear from the documentation if that is a valid option.

container-storage-interface / spec

NodeStage/NodeUnstage and NodePublish/NodeUnpublish being called concurrently #389