Gluster Operator - Initial Design

jarrpa commented 6 years ago

A design doc to hopefully get some traction on the development of a Gluster operator. From the introduction:

This document aims to settle on a concrete set of functionality that a Gluster
operator should have. It is outside the scope of this document to specify any
implementation details of this functionality. The desired functionality should
not be strictly limited by the extents of current-day technologies but should
not dismiss the costs of waiting for and executing forward development in those
technologies.

In addition, this document only specifies the desired functionality for
declaring a 0.1 release. Further enhancements and feature roadmaps will be
captured elsewhere.

Signed-off-by: Jose A. Rivera jarrpa@redhat.com

This change is

travisn commented 6 years ago

You might consider adding the CRD spec to the design doc. This will help define the desired state the admin can specify when he creates the gluster cluster. It will also help differentiate between what is desired state managed by the operator vs configuration tasks that the operator likely wouldn't manage.

For the configuration tasks, consider that gluster should have a "toolbox" pod that allows the admin to drop in and perform operational tasks manually. The toolbox allows running tasks that the operator might not be expected to run, either because they are one-time admin tasks or because the admin just needs to override the operator.

jarrpa commented 6 years ago

@travisn

You might consider adding the CRD spec to the design doc. This will help define the desired state the admin can specify when he creates the gluster cluster. It will also help differentiate between what is desired state managed by the operator vs configuration tasks that the operator likely wouldn't manage.

I would think the CRD spec would be too close to the implementation of the operator to merit including in this document. As it stands, the CRD in my prototype has evolved over the course of the implementation as I think things through and try things out.

For the configuration tasks, consider that gluster should have a "toolbox" pod that allows the admin to drop in and perform operational tasks manually. The toolbox allows running tasks that the operator might not be expected to run, either because they are one-time admin tasks or because the admin just needs to override the operator.

Wouldn't that just be the same as exec'ing into one of the Gluster nodes?

travisn commented 6 years ago

@jarrpa great, if the tools are all available in the gluster nodes, no need for a separate toolbox.

What I'm looking for is a declaration of desired state that can be controlled by the admin. As is, the doc makes it sound like the operator will be able to automatically maintain the health, grow the cluster, and other actions without any input from an admin. Since you're using a CRD you must have desired state that controls this, I just can't see what it is. If the CRD is too detailed, perhaps an abstraction of the CRD?

jarrpa commented 6 years ago

@travisn Ah, okay. The desired state is only changed by the operator in the case of a managed deployment, where both the services and the storage devices are not tied to specific hosts. Otherwise it is up to the administrator to change the CRD object.

I can see the rational for some general state outline, at least. I'll work on that.

phlogistonjohn commented 6 years ago

Perhaps this is too implementation oriented but I think it would be nice if the ability to make changes to the cluster were individually toggle-able. For example you are working with a "managed" cluster but do not want it to automatically perform Cluster scaling but storage scaling might be ok.

IOW, are the target configurations in the doc more like use cases or something that would actually be specified in the system configuration?

jarrpa commented 6 years ago

@phlogistonjohn Definitely more implementation-level than I want to cover here. But I fully intend to make things highly configurable in my implementation. ;)

jarrpa commented 6 years ago

Updated the PR:

Added a description of state information that the operator will maintain.
Removed the "cluster scaling" feature.

jarrpa commented 6 years ago

Pushed updates:

Changed desired version from 1.0 to 0.1.
Removed the notion of "automated Gluster clusters", I already assume elsewhere that when I say "Gluster cluster" the reader knows I'm talking about those Gluster clusters that are deployed and automated by the operator.

gluster / gluster-kubernetes

Gluster Operator - Initial Design #476