container-storage-interface / spec

Container Storage Interface (CSI) Specification.
Apache License 2.0
1.34k stars 373 forks source link

Update spec doc for GroupController service #545

Open carlory opened 1 year ago

carlory commented 1 year ago

What type of PR is this?

/kind document

Special notes for your reviewer:

Does this PR introduce an API-breaking change?:

none
carlory commented 1 year ago

/cc @xing-yang @jdef @saad-ali @bswartz

carlory commented 1 year ago

A new change is that the RPC Interactions section is added.

cc @jdef @xing-yang

carlory commented 1 year ago

/cc @xing-yang

carlory commented 1 year ago

@xing-yang updated. please review again.

bswartz commented 1 year ago

How do we mark a snapshot as "broken" or "partial"? Currently we don't have a way to do that and communicate back to CO.

I don't necessarily recommend this, but we could add an additional boolean field in the response message to indicate it.

It's a unique problem for group snapshots because if you have multiple volumes and most of the snapshots were okay, but one cannot succeed, there are good arguments both to preserve the partial group snapshot and arguments to delete it and try over again. My personal inclination would to be to delete it -- in which case the SP could do that work with no RPC changes. But if we return success with a "partial" boolean set then the CO could also delete it, while also having the option to not delete it.

saad-ali commented 1 year ago

Confirmed @carlory signed CSI CLA

saad-ali commented 8 months ago

How do we mark a snapshot as "broken" or "partial"? Currently we don't have a way to do that and communicate back to CO.

I don't necessarily recommend this, but we could add an additional boolean field in the response message to indicate it.

It's a unique problem for group snapshots because if you have multiple volumes and most of the snapshots were okay, but one cannot succeed, there are good arguments both to preserve the partial group snapshot and arguments to delete it and try over again. My personal inclination would to be to delete it -- in which case the SP could do that work with no RPC changes. But if we return success with a "partial" boolean set then the CO could also delete it, while also having the option to not delete it.

Per @xing-yang should fail and clean up if even one snapshot in group fails because they have to be consistent.
Per @bswartz if SP gets 9/10 it is responsible for error handling.

@xing-yang will take another look at this PR.