container-storage-interface / spec

Container Storage Interface (CSI) Specification.
Apache License 2.0
1.34k stars 373 forks source link

Proposal: Provide publish context during ControllerUnpublishVolume #496

Open sumukhs opened 2 years ago

sumukhs commented 2 years ago

Problem Overview:

ControllerPublishVolume has a contract to return FAILED_PRECONDITION with the status indicating the node id that the volume is currently published on. This helps detecting stale ControllerPublishVolume API calls by the CO to the SP controller and take appropriate action.

However, during the controller unpublish, consider the following ordering of events:

  1. CO issues a ControllerUnpublishVolume(node1) to the Plugin.

  2. The above TCP message is delayed indefinitely over the network.

  3. Node on which CO is running experiences a network partition and is not part of a federated cluster anymore. As a result, the CO is now running in a different node that decides to place it on node2. NOTE that the old TCP connection from the CO is still alive to the Plugin.

  4. CO issues ControllerPublishVolume(node2) to the Plugin.

  5. CO then decides to place the volume on node1 again hence issues ControllerUnpublishVolume(node2) subsequently doing a ControllerPublishVolume(node1).

  6. Finally, the delayed TCP message from 2 arrives indicating a ControllerUnpublishVolume(node1)

Expectation:

The Plugin must be able to identify the stale API call that comes in step 6 and ignore it.

Proposal:

If the ControllerUnpublishVolume includes the original publish context that was returned as part of the ControllerPublishVolume, the Plugin/SP can sequence these and detect staless.

jdef commented 2 years ago

The controller-unpublish call does not relay state on purpose. Consider that the CO is braindead and loses context. It should still be possible to unpublish.

I'm sympathetic to the partitioning problem, and think that the CO should do more to help plugins deal with such.

-1 on this proposal.

On Fri, Nov 5, 2021, 4:36 PM Sumukh Shivaprakash @.***> wrote:

Problem Overview:

ControllerPublishVolume has a contract to return FAILED_PRECONDITION with the status indicating the node id that the volume is currently published on. This helps detecting stale ControllerPublishVolume API calls by the CO to the SP controller and take appropriate action.

However, during the controller unpublish, consider the following ordering of events:

1.

CO issues a ControllerUnpublishVolume(node1) to the Plugin. 2.

The above TCP message is delayed indefinitely over the network. 3.

Node on which CO is running experiences a network partition and is not part of a federated cluster anymore. As a result, the CO is now running in a different node that decides to place it on node2. NOTE that the old TCP connection from the CO is still alive to the Plugin. 4.

CO issues ControllerPublishVolume(node2) to the Plugin. 5.

CO then decides to place the volume on node1 again hence issues ControllerUnpublishVolume(node2) subsequently doing a ControllerPublishVolume(node1). 6.

Finally, the delayed TCP message from 2 arrives indicating a ControllerUnpublishVolume(node1)

Expectation:

The Plugin must be able to identify the stale API call that comes in step 6 and ignore it. Proposal:

If the ControllerUnpublishVolume includes the original publish context that was returned as part of the ControllerPublishVolume, the Plugin/SP can sequence these and detect staless.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/container-storage-interface/spec/issues/496, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLBTVPDVDLJRE6OI2UDUKRE6ZANCNFSM5HOU3QLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sumukhs commented 2 years ago

Isn't the co storing/plumbing around the volume id and volume context already?

I didn't get how that is different from the publish context.

If the co is brain dead and loses all its context, how would it be able to controller unpublish anyway?

jdef commented 2 years ago

The only required field for controller-unpublish is volume_id.

On Fri, Nov 5, 2021, 6:40 PM Sumukh Shivaprakash @.***> wrote:

Isn't the co storing/plumbing around the volume id and volume context already?

I didn't get how that is different from the publish context.

If the co is brain dead and loses all its context, how would it be able to controller unpublish anyway?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/container-storage-interface/spec/issues/496#issuecomment-962261856, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLDD3TAGUYWV4B3SERDUKRTO3ANCNFSM5HOU3QLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sumukhs commented 2 years ago

Isn't the assumption that the volume_id is also lost though?

CO would still have to go through the flow of all the idempotent calls like CreateVolume (which gives back the volume_id), then ControllerPublishVolume(which being idempotent can give back the publish context again) and finally invoke the ControllerUnpublishVolume(publish_context)?

sumukhs commented 2 years ago

Unless you are referring to manual intervention by a human who is trying to patch this up and wants to unpublish the volume - I see your point then.

jdef commented 2 years ago

Correct. A human should be able to provide volume ID.

On Fri, Nov 5, 2021, 7:02 PM Sumukh Shivaprakash @.***> wrote:

Unless you are referring to manual intervention by a human who is trying to patch this up and wants to unpublish the volume - I see your point then.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/container-storage-interface/spec/issues/496#issuecomment-962268850, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLG27DPOONBTNINNE2TUKRWAZANCNFSM5HOU3QLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sumukhs commented 2 years ago

That makes sense. Any suggestions on how this can be tackled or pointers to other approaches are appreciated!

Going back to your original response, what did you imply by "CO must do more". A CO doing anything more wouldn't be csi conformant and hence lose the flexibility of plugging in different SP's right?

jdef commented 2 years ago

CO must do more:

There's another csi proposal that deals with partitioning and attempts to implement a fencing strategy within csi - I commented there as well. The CO bears more responsibility (for complexity) in such cases so that plug-in implementations (and the csi spec) remain simpler.

It sounds like you're trying to address an important problem with a minimal spec change, and in general I'm supportive of such surgical approaches. However, in this case, the proposal violates a principal that we established early on re: braindead CO and volume cleanup operations.

On Fri, Nov 5, 2021, 7:31 PM Sumukh Shivaprakash @.***> wrote:

That makes sense. Any suggestions on how this can be tackled or pointers to other approaches are appreciated!

Going back to your original response, what did you imply by "CO must do more". A CO doing anything more wouldn't be csi conformant and hence lose the flexibility of plugging in different SP's right?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/container-storage-interface/spec/issues/496#issuecomment-962278316, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLELQS2E7WLPF756JF3UKRZORANCNFSM5HOU3QLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sumukhs commented 2 years ago

Yes, for the apis in csi that I cared about (create,controller publish/unpublish, node stage/unstage, node publish/unpublish, delete), it seems like we can deal with a network partition for all the cases except the controller unpublish and hence the question here.

I can take a look at the other proposal to see how the co could do more to handle this situation - can you point me to the issue number?

The way I think of a co, in spite of all its complexity and how many ever intermediate states it persists, a network partition causing a stale api call to arrive at a plugin during unpublish will not be distinguished because the spec only allows for a volume id.

I can think of passing in an optional "etag" that if absent will be treated as the safety net/human intervention where the call will be honored and the unpublish will be accepted.

However, if the value is set to something (which will be the case in the stale api call), it is ignored due to the etag checks. It is similar to database optimistic concurrency. Would this address the drawback of the principles of csi you pointed out in my original proposal?

sumukhs commented 2 years ago

@jdef - gentle ping!

jdef commented 2 years ago

Not sure why I'm pinged here. Afaict this issue / ticket has been settled.

On Tue, Nov 9, 2021, 12:27 AM Sumukh Shivaprakash @.***> wrote:

@jdef https://github.com/jdef - gentle ping!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/container-storage-interface/spec/issues/496#issuecomment-963829899, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLB4I6BBOTOCUPHCNU3ULCWMBANCNFSM5HOU3QLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sumukhs commented 2 years ago

I can think of passing in an optional "etag" that if absent will be treated as the safety net/human intervention where the call will be honored and the unpublish will be accepted.

However, if the value is set to something (which will be the case in the stale api call), it is ignored due to the etag checks. It is similar to database optimistic concurrency. Would this address the drawback of the principles of csi you pointed out in my original proposal?

Wondering what your take on this approach is:

jdef commented 2 years ago

I think I misread the original proposal, mistakenly confusing volume-context for publish-context. That said...

"etag" sounds like additional state that: (a) CO would need to track/remember as a result of controller-publish-volume, so that... (b) CO would present to the controller-unpublish-volume call

"etag" seems like "special publish volume context". Let's avoid special case things like this.

So let's say that publish-context was added as an always-OPTIONAL field for controller-unpublish-volume:

message ControllerUnpublishVolumeRequest {
...
  map<string,string> publish_context = x; // OPTIONAL, always
}

... and the new RPC semantics become ...

when publish-context is empty: (a) because the plugin never provided one as a result of controller-publish-volume; no change from today's behavior (b) because the CO forgot the old value; no change from today's behavior (c) because the CO implements an older version of CSI, it is not aware of publish-context for this unpublish call; no change in today's behavior

when publish-context is non-empty: (d) the CO remembered it from the publish RPC, and ... (d.1) it matches what the plugin expects; no change from today's behavior (d.2) it does NOT match what the plugin expects; new behavior: RPC results in FAILED_PRECONDITION (e) the plugin is mid-upgrade, the older plugin version doesn't process the non-empty publish-context (because it implements an older CSI spec); no change from today's behavior

At first glance, this seems to be backwards compatible and the new behavior is completely opt-in from a plugin perspective.

I'm curious - did you actually run into this situation in a live cluster, or this proposal born of hypotheticals?

sumukhs commented 2 years ago

At first glance, this seems to be backwards compatible and the new behavior is completely opt-in from a plugin perspective.

Yes, given its optional behavior it would be a non-breaking change for existing plugin's/CO's. Is the expectation for the feature proposer to send out a PR/do one of the maintainers would take care of incorporating this in the next release?

I'm curious - did you actually run into this situation in a live cluster, or this proposal born of hypotheticals?

We did not hit this in a live cluster but we have a mock CO and CSI simulator that exercised this code path and ran into a coding assert.

jdef commented 2 years ago

Kindly submit a PR. If this is your first, you probably need to sign the CLA, otherwise PR cannot be accepted.

On Tue, Nov 9, 2021, 10:44 AM Sumukh Shivaprakash @.***> wrote:

At first glance, this seems to be backwards compatible and the new behavior is completely opt-in from a plugin perspective.

Yes, given its optional behavior it would be a non-breaking change for existing plugin's/CO's. Is the expectation for the feature proposer to send out a PR/do one of the maintainers would take care of incorporating this in the next release?

I'm curious - did you actually run into this situation in a live cluster, or this proposal born of hypotheticals?

We did not hit this in a live cluster but we have a mock CO and CSI simulator that exercised this code path and ran into a coding assert.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/container-storage-interface/spec/issues/496#issuecomment-964274434, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR5KLDZRIJZQB5DCOBJOCDULE6XLANCNFSM5HOU3QLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.