Open JohnStrunk opened 6 years ago
I propose the plan of doing gluster-block CSI like https://github.com/gluster/gluster-csi-driver/issues/10#issuecomment-428669278
If we do it right, it should be much cleaner approach to keep the changes minimal in existing stacks!
Need inputs from people who are experienced already on this @SaravanaStorageNetwork @raghavendra-talur @humblec @obnoxxx ...
Finish implementation of gluster/gluster-block#140 and Build 'gluster-blockd+gluster-block-cli' as a separate container.
Nothing.
Implement CSI driver for gluster-block with same code from heketi, keeping the API almost similar.
Considering the API is almost similar, the StorageClass definition in PVC could be similar, so we can validate any tests which is already present to be utilized by this.
@pkalever @lxbsz @pranithk @Madhu-1 @aravindavk let me know your thoughts too.
Adding to it
This may be required if we don't want to do a 'ssh' to gluster-blockd container to execute gluster-block cli.
This can be done after we make all other changes in the build system and validate that the expected changes work as expected in build pipeline!
I propose the plan of doing gluster-block CSI like gluster/gluster-csi-driver#10 (comment)
I think it is reasonable to have (all) the provisioning handled by the CSI block driver. I'm still going to suggest separate drivers for heketi/gd2 so that we can avoid multiple codepaths and not pollute the StorageClass with a whichBackend: gd2
:smile: parameter. Instead, it would be handled at deploy time by starting the correct driver.
My main concern is that we think through all the request flows and make sure we can have g-b in its own DS that is separate from gd2 pods. We will also need to see if the DS will still need to use host network.
First of all, such design discussions are imho better held in a PR with a design document than in issues, since the issues eventually scatter the content and decisions over many comments. A PR with a doc always has the latest state in one place... And agreement will result in merging and you have a point the repo to go and check what was agreed upon.
Now for the design discussion:
The old stack (heketi/gd1) has the high level block-volume management functionality in heketi, in parallel to the volume functionality. It is also using the volume functionality to automagically manage the block-hosting volumes transparently in the background (if enabled). Here is the original end-to-end design for gluster-block in the old stack:
https://github.com/gluster/gluster-kubernetes/blob/master/docs/design/gluster-block-provisioning.md
Now questions for this new appraoch:
Why the requirement to separate the gluster and gluster-block containers at this stage?
The gluster server pods are no longer a DaemonSet and no longer use the host network. As such, if the backing storage supports it, these pods can float around the cluster as necessary to maintain availability. This gives us the potential to use kubernetes' built-in functionality to increase our resiliency. The iSCSI target is kernel code, making the above changes impractical for the g-b functionality (though I would love somebody to prove me wrong).
Why put the block-hosting-volume creation logic into the CSI driver and not into gd2?
g-b is just an application that uses gluster file storage (and happens to re-export that capacity). Separating the layers cleanly provides dev, maintenance, and deployment advantages. If the "setup" for g-b were added to gd2 via plugin or core functionality, it would mean g-b and gd2 development would be tied together. Since there are no other users of this functionality and the same tasks can be completed via the existing gd2 provisioning API, intertwining the two components seems to be a layer violation by pulling application functionality (g-b volume setup) into the lower-level storage system.
I understand the impulse to move all of heketi's functionality into gd2, but we are really just trying to move the functionality out of heketi (either up or down), choosing a location that makes the most sense for the architecture we're trying to build.
@obnoxxx @JohnStrunk Thanks for your thoughts!
For now, after further discussion with @Madhu-1 and @aravindavk, who had done some work in getting block-csi and gd2 plugin for g-b, It looks like getting gd2 plugin for block would be 'quicker' as @obnoxxx mentions. Also, it means, keeping the g-b API same in gd2 plugin, so CSI driver written to use this, would be essentially same for both gd2 based container, and heketi/gd1 based containers.
But, I agree with @JohnStrunk that it will create a dependency in gd2 for gluster-block, where it need not have it. Also while packaging I noticed that at least gluster-block
CLI should be packaged with gd2 container, even if we want to host gluster-blockd as a separate container. Note that gluster-block as of now doesn't build separate RPM for these. With a separate CSI driver dealing with gluster-block, gluster-block package would need to be installed only in CSI driver.
gd2 pods don't use host networking anymore. If they get packaged together, g-b would also not be using host network. Will it work in such an environment?
WIP design if we choose to integrate with Glusterd2 itself https://github.com/gluster/glusterd2/pull/1319
We need to document how we plan to integrate gluster-block w/ GCS in order to identify gaps that need to be closed for GCS-1.0.
Gluster pods are no longer DaemonSets, nor using host networking. Once we switch to block PVs, we can remove the nodeAffitity that locks the pods to a specific node. My understanding is that movable pods will not work for gluster-block currently.