gluster / anthill

A Kubernetes/OpenShift operator to manage Gluster clusters
http://gluster-anthill.readthedocs.io/
Apache License 2.0
35 stars 12 forks source link

Add a proposal for the Gluster CRDs #40

Closed JohnStrunk closed 5 years ago

JohnStrunk commented 5 years ago

Describe what this PR does This PR provides a proposed set of CRs for the Gluster operator. While this proposal encompasses much more than what an initial implementation will support, my goal is to make room the features we know we will need to support eventually so we can incorporate them over time, maintaining compatibility of the CRD.

Is there anything that requires special attention?

While I expect this proposal to be fairly contentious, please try to offer concrete suggestions for improvement along with the criticisms. :pray:

Related issues: Closes #3

JohnStrunk commented 5 years ago

Folks who should take a look:

jarrpa commented 5 years ago

An off-hand remark to start after my first skimming: While "quiesce" is a technically appropriate term, I think it's a little too uncommon and hard to quickly understand. I think either of "inactive" or "disabled" would be better.

JohnStrunk commented 5 years ago

Here's something I'd like opinions on: Currently, there is both a Cluster CR and a Zone CR that are mandatory. In many (most?) environments, the number of Zones would be small (1-3). Should we move the information of the Zone CR into the Cluster CR as a list? Imagine adding the following to the spec section of the Cluster CR (taking the place of 3x Zone CRs in the AWS multi-az example):

zones:
  - name: az1
    management:
      mode: automatic
    automatic:
      minNodes: 3
      maxNodes: 99
      maxStorage: 100Ti
      freeStorageMin: 500Gi
      freeStorageMax: 2Ti
    storage:
      mode: volume
      volume:
        size: 1Ti
        storageClass: ebs-az1
  - name: az2
    management:
      mode: automatic
    automatic:
      minNodes: 3
      maxNodes: 99
      maxStorage: 100Ti
      freeStorageMin: 500Gi
      freeStorageMax: 2Ti
    storage:
      mode: volume
      volume:
        size: 1Ti
        storageClass: ebs-az2
  - name: az3
    management:
      mode: automatic
    automatic:
      minNodes: 3
      maxNodes: 99
      maxStorage: 100Ti
      freeStorageMin: 500Gi
      freeStorageMax: 2Ti
    storage:
      mode: volume
      volume:
        size: 1Ti
        storageClass: ebs-az3

Whatever would have gone into the status of the Zone CR could be placed into a corresponding list in the Cluster CR's status.

The benefit to the above change would be that the admin need only create a single CR to get a cluster up and running (or show their configuration). The down-side is that the CR is bigger w/ more fiddly indentation.

jarrpa commented 5 years ago

@JohnStrunk I say the ease of having only one CR is worth the increased length. In particular, I see the majority of users only having one Zone anyway.

phlogistonjohn commented 5 years ago

First off, this is a really awesome start! :-)

I have a couple questions based on my first read-through:

(Edit: The next two points I wrote overlap a bit with what @JohnStrunk and @jarrpa were discussing above. Apologies for any redundancy, but rather than interject into that subthread I'll leave what I wrote intact but admit that I didn't read through all that when I wrote my comments)

To expand on my "multiple named subsections Idea above". I imagine that we could define the storage section with something like:

  storage:
    # the default storage policy
    default:
      mode: device
      device: 
        - /dev/sdb
        - /dev/sdc
    # storage policy to be used on oldservers (TBD how oldservers are matched with the pods)
    oldservers:
      mode: device
      device:
        - /dev/sdc
        - /dev/sdd
        - /dev/sde

If we wanted to do per node storage we could have a key like "storageOverride".

jarrpa commented 5 years ago

@phlogistonjohn I would modify your proposal to have spec.storage be a list, and each item in storage have a name key to identify. Then you could have spec.zone.storage be a list of names of storage that apply to that zone.

More broadly, to account for legacy scenarios where nodes have asymmetric storage topology, we could go as far as just having one Zone per node such that we can specify exact devices for each node. It's an abuse of the intent of the Zone, but one that's technically feasible given that Zones are (to the Operator) just logical distinctions and have no technical underpinnings (as far as Gluster itself is concerned).

phlogistonjohn commented 5 years ago

I would modify your proposal to have spec.storage be a list, and each item in storage have a name key to identify. Then you could have spec.zone.storage be a list of names of storage that apply to that zone.

I understand using a list, it's six of one, half dozen of the other for me. I'm not sure what the 2nd sentence is getting at though. Could you provide an example of what you mean?

More broadly, to account for legacy scenarios where nodes have asymmetric storage topology, we could go as far as just having one Zone per node such that we can specify exact devices for each node. It's an abuse of the intent of the Zone, but one that's technically feasible given that Zones are (to the Operator) just logical distinctions and have no technical underpinnings (as far as Gluster itself is concerned).

That sounds a bit too hacky / clever to me. If I really thought asymmetry in the storage devices of the nodes really would be a legacy thing you might be able to convince me. However, I really do think these sorts of asymmetries will occur naturally in the real world as clusters age & grow. If we need to have a strict matching of device layout to abstraction, it seems that maybe we need an additional level of abstraction between node and zone. This may not need it's own CRD but it would prevent us from overloading the concept of zone beyond the natural understanding that users will have (roughly an Availablity Zone) IMO.

jarrpa commented 5 years ago

I understand using a list, it's six of one, half dozen of the other for me. I'm not sure what the 2nd sentence is getting at though. Could you provide an example of what you mean?

A list would be more graceful in implementation, IMO. :) As far as the second sentence, think like volumes and volumeMounts in kube Pods. To put the two together, something like this:

spec:
  management:
    mode: external
    external:
      manual:
        nodes: 2
        storage:
        - storage1
  storage:
  - name: storage1
    mode: device
    device: 
      - /dev/sdb
      - /dev/sdc
  - name: storage2
    mode: device
    device:
      - /dev/sdc
      - /dev/sdd
      - /dev/sde

More broadly, to account for legacy scenarios where nodes have asymmetric storage topology, we could go as far as just having one Zone per node such that we can specify exact devices for each node. It's an abuse of the intent of the Zone, but one that's technically feasible given that Zones are (to the Operator) just logical distinctions and have no technical underpinnings (as far as Gluster itself is concerned).

That sounds a bit too hacky / clever to me. If I really thought asymmetry in the storage devices of the nodes really would be a legacy thing you might be able to convince me. However, I really do think these sorts of asymmetries will occur naturally in the real world as clusters age & grow. If we need to have a strict matching of device layout to abstraction, it seems that maybe we need an additional level of abstraction between node and zone. This may not need it's own CRD but it would prevent us from overloading the concept of zone beyond the natural understanding that users will have (roughly an Availablity Zone) IMO.

In that case, we could expand the Zone spec to allow multiple entries in spec.management. To expand on the above:

spec:
  management:
  - mode: external
    external:
      manual:
        nodes: 2
        storage:
        - storage1
  - mode: automatic
    automatic:
      manual:
        minNodes: 2
        maxNodes: 99
        storage:
        - storage2
  storage:
  - name: storage1
    mode: device
    device: 
      - /dev/sdb
      - /dev/sdc
  - name: storage2
    mode: volume
    volume:
      size: 500Gi
      storageClassName: my-az1-class

I'm not sure how likely it is to have mixed-management-mode clusters, but it should be technically feasible. More likely, as you said, you'd only use this for external clusters that grow over time.

JohnStrunk commented 5 years ago

Following up on some of the questions/discussion so far...

Since my question about merging the Zone w/ the Cluster got a positive response, I'm going to write up and push that change.

@phlogistonjohn You asked about automatic/manual/external... (I think you figured it out, so this is more for the benefit of others following along) The idea is that there are 2 way to run: in the kube cluster as pods or connecting to an external gluster cluster. "automatic" and "manual" would be for running as pods, and the defference between these is just that in auto mode, the operator can create & destroy nodes at will, whereas in manual mode, it maintains a fixed number of nodes in that zone. External mode is for using an external cluster. In that case, you have to tell the operator where to find the nodes, and you also have to manage them yourself (create/destroy/upgrade). Perhaps a better naming scheme would be: dynamic, fixed, external?

I, too, have a dislike for the asymmetry of the "external" nodes in the Zone. I'm trying to come up with a way to push the device list down to the Node CR (so we can use device UUIDs), and convert what is now at the Zone level into a "template" (as that's what it actually is). Perhaps with some re-arranging, it's possible to make the "external" nodes have the contact info in Node CRs, and have corresponding Zones with a "custom" setting (i.e., child Nodes are created manually and don't follow a template). :thinking:

For storage.mode: device, there's just no safe way to use /dev/sd*. I was trying to figure out a way to, on first start, record the UUID of the device, but there's no guarantee that even on 1st start the devices are what the admin thinks they are. So, I'm starting to be rather pessimistic about finding a way to templatize mode=device at all.

phlogistonjohn commented 5 years ago

In that case, we could expand the Zone spec to allow multiple entries in spec.management. To expand on the above:

I think I like this, although I think this "thing" needs some sort of name so we can talk about it. (Not entirely serious suggestions: corrals)

@phlogistonjohn You asked about automatic/manual/external... (I think you figured it out, so this is more for the benefit of others following along)

Actually, I hadn't, so this is great.

The idea is that there are 2 way to run: in the kube cluster as pods or connecting to an external gluster cluster. "automatic" and "manual" would be for running as pods, and the defference between these is just that in auto mode, the operator can create & destroy nodes at will, whereas in manual mode, it maintains a fixed number of nodes in that zone. External mode is for using an external cluster. In that case, you have to tell the operator where to find the nodes, and you also have to manage them yourself (create/destroy/upgrade). Perhaps a better naming scheme would be: dynamic, fixed, external?

I agree a better name would help make the concept clearer. I also think this is something akin to the current approach being taken by the gk-deploy and openshift-ansible approaches, no? If that's so, the fixed count threw me a little bit. IIRC the current style is based on node labeling. If I'm not too off base would it make sense to make that counter a minimum value (used to assist with upgrades / reboots) and an optional label? Perhaps a list of starter nodes? As for naming maybe something like: "predefined" or "restricted"?

I, too, have a dislike for the asymmetry of the "external" nodes in the Zone. I'm trying to come up with a way to push the device list down to the Node CR (so we can use device UUIDs), and convert what is now at the Zone level into a "template" (as that's what it actually is). Perhaps with some re-arranging, it's possible to make the "external" nodes have the contact info in Node CRs, and have corresponding Zones with a "custom" setting (i.e., child Nodes are created manually and don't follow a template). thinking

+1 to the sentiment. I think what might be helpful here is to imagine what the user workflow would be. Since I think its not particularly good UX to make someone type a lot of boilerplate we could have a sort of initial template (initialNodes?) buried in the zone, but that always expands out to node CRs and post install we always manage via nodes?

For storage.mode: device, there's just no safe way to use /dev/sd*. I was trying to figure out a way to, on first start, record the UUID of the device, but there's no guarantee that even on 1st start the devices are what the admin thinks they are. So, I'm starting to be rather pessimistic about finding a way to templatize mode=device at all.

Hehe. This mirrors a discussion I have been part of at the heketi repo. I'm convinced that normal people think of their devices as /dev/sdX etc. But, that's not "safe" for long term use. But I think we're relatively safe if we are up front about saying these device nodes are what's used for initial discovery but not long term use. I think it could be sufficient if the operator only uses these devices to feed the next step in the management chain (eg. heketi or gd2) and then perhaps rely on those subsystems to come back to the operator with a stable identifier?

jarrpa commented 5 years ago

I've finally taken the time to do a more thorough review of the proposal. Below are my thoughts as well as some repeat of prior discussion:

Some examples that put together the above:

Minimalist Cluster CR

apiVersion: "operator.gluster.org/v1alpha1"
kind: GlusterCluster
metadata:
  name: my-cluster
  namespace: gcs
spec:
  zones:
  - name: az1
    nodes:
    - nodes-dynamic
  nodeTemplates:
  - name: nodes-dynamic
    dynamic:
      minNodes: 2
      maxNodes: 99

Complex Cluster CR

apiVersion: "operator.gluster.org/v1alpha1"
kind: GlusterCluster
metadata:
  name: my-cluster
  namespace: gcs
spec:
  clusterOptions:
    "cluster.halo-enabled": "yes"
  glusterCA: # optional
    name: ca-secret
    namespace: gcs # optional, defaults to GlusterCluster's namespace
  zones:
  - name: az1
    nodes:
    - nodes-dynamic
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: failure-domain.beta.kubernetes.io/zone
              operator: In
              values:
                - us-west-2a
  - name: az2
    nodes:
    - nodes-dynamic
    disabled: True
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: failure-domain.beta.kubernetes.io/zone
              operator: In
              values:
                - us-east-2a
  - name: az3
    nodes:
    - nodes-manual
    - nodes-external
  nodeTemplates:
  - name: nodes-dynamic
    dynamic:
      minNodes: 2
      maxNodes: 99
      storage:
      - storage1
  - name: nodes-manual
    manual:
      nodes: 2
      storage:
      - storage2
    stateVolume:
      storageClassName: manual-state-class
  - name: nodes-external
    external:
      nodes:
      - 1.2.3.4
      - some.node.net
      credentialsSecret: # required
        name: ssh-secret
        namespace: gcs # optional, defaults to the GlusterCluster's namespace
  storageTemplates:
  - name: storage1
    volumes:
    - size: 500Gi
      storageClassName: az1-class
      minCount: 1
      maxCount: 100
  - name: storage2
    devices: 
    - /dev/sdb
    - "/dev/sd[d-f]"

Examples for the above:

Node CR - Dynamic (created by Operator, immutable except for spec.desiredState)

apiVersion: "operator.gluster.org/v1alpha1"
kind: GlusterNode
metadata:
  name: az1-nodes-dynamic-001
  namespace: gcs
  cluster: my-cluster
  zone: az1
  nodeGroup: nodes-dynamic
spec:
  desiredState: active
status:
  currentState: active

Node CR - Manual (created by Operator, mutable by admin)

apiVersion: "operator.gluster.org/v1alpha1"
kind: GlusterNode
metadata:
  name: az1-nodes-manual-001
  namespace: gcs
  cluster: my-cluster
  zone: az3
  nodeGroup: nodes-manual
spec:
  storage:
  - volumes: 
    - claimName: block-pvc
      claimNamespace: gcs # optional, defaults to the GlusterNode's namespace
  desiredState: disabled
status:
  currentState: disabled

Node CR - External (created by admin, mutable by admin)

apiVersion: "operator.gluster.org/v1alpha1"
kind: GlusterNode
metadata:
  name: some.node.net
  namespace: gcs
  cluster: my-cluster
  zone: az3
  nodeGroup: nodes-external
spec:
  credentialsSecret: # required
    name: ssh-secret
    namespace: gcs # optional, defaults to the GlusterNode's namespace
  storage:
  - devices: 
      - /dev/sdb
  desiredState: active
status:
  currentState: unavailable

This all seems somewhat fiddly, but I think it's worth it to try and strike a balance between flexibility and maintenance overhead. If we wanted, we could ship utilities to generate examples inside the Operator container, such that you could run something like kubectl exec <anthill_pod> /gen-cluster-template.sh > cluster.yaml and have something to start from.

I have some more thoughts on cluster/node state, but I'll save those for another comment. :)

JohnStrunk commented 5 years ago

Sorry to have disappeared for so long on this one...

@phlogistonjohn asked:

One thing I expected to see but didn't was a mechanism to specify what images should be used.

Since the operator needs to be able to manage upgrades (potentially a multi-step process walking across versions of server and csi pods), it needs to have a manifest knowing what transitions are legal (and tested). This precludes allowing the admin to specify arbitrary images & tags. The actual info will be a part of the manifest bundled w/ the operator. This is similar to how OLM works.

JohnStrunk commented 5 years ago

@jarrpa The configurability seems to have exploded. :) If I back up a step here, I think we want 3 possible configurations for a gluster pod:

  1. Usage of block PV (desired)
  2. Usage of device in /dev (as stopgap until 1 is everywhere)
  3. External nodes I don't think I want to expose directly specifying the PVCs that would be used in (1). Do you have a usecase in mind where that is important?
JohnStrunk commented 5 years ago

We technically don't really need the mode field at all. Speaking from Go, we could take a page from the k8s PersistentVolume spec and just have an ManagementType field that's anonymous in the parent struct...

If I'm understanding correctly this would be:

management:
  automatic:
    # stuff for auto

or

management:
  manual:
    # stuff for manual

but an error if

management:
  automatic:
    # stuff
  manual:
    # stuff

so the mode: auto|manual becomes superfluous. I like it. :+1:

JohnStrunk commented 5 years ago

I'm convinced that normal people think of their devices as /dev/sdX etc. But, that's not "safe" for long term use. But I think we're relatively safe if we are up front about saying these device nodes are what's used for initial discovery but not long term use. I think it could be sufficient if the operator only uses these devices to feed the next step in the management chain (eg. heketi or gd2) and then perhaps rely on those subsystems to come back to the operator with a stable identifier?

I think this is really dangerous. The CRs hold the authoritative, declarative state. The only (almost) safe thing I can think of would be for the operator to resolve to UUIDs, then overwrite the sdX w/ UUIDs in the CR. Putting it in the status: doesn't fix the problem, nor does trying the preserve a sdX==>UUID mapping someplace. It works if nothing will ever be changed in the future (add or remove), but breaks as soon as changes are allowed. Complicating this is that the operator modifying the admin's settings is frowned upon (and a race).

JohnStrunk commented 5 years ago

So we're talking about splitting the "template" functionality from the "failure domain" specification. Is this really necessary? I feel like it's bringing in added room for misconfiguration. Leaving them tied together means you can't put "dev"-backed pods in the same domain as "pv"-backed or external, but does that really matter?

@phlogistonjohn brought up different templates for different server configs... if we mandate UUID device specification, there aren't really any templates when using /dev, so they could all go in the same zone. Dev-based zones would require manual creation of Node CRs, so they need not be homogeneous, and the operator wouldn't overwrite them.

jarrpa commented 5 years ago

The configurability seems to have exploded. :)

It's a bad habit of mine. ;) See my openshift-ansible work.

If I back up a step here, I think we want 3 possible configurations for a gluster pod:

  1. Usage of block PV (desired)
  2. Usage of device in /dev (as stopgap until 1 is everywhere)
  3. External nodes I don't think I want to expose directly specifying the PVCs that would be used in (1). Do you have a usecase in mind where that is important?

As always, I'm trying to think about on-prem edge cases, because that's where the challenge is going to lie. I'm thinking of situations where 1. We want to use local block PVs, 2. Dynamic provisioning of local block PVs doesn't exist, and 3. The admin wants to ensure that very specific drives are used. We should also consider the possibility that users will come up with their own reasons we can't think of right now for not wanting to use StorageClasses, in whatever management mode. I can kinda guess at the code path for this when we already have the SC-based path, and it doesn't seem like it wouldn't be bad to implement and maintain.

I think this is really dangerous. The CRs hold the authoritative, declarative state. The only (almost) safe thing I can think of would be for the operator to resolve to UUIDs, then overwrite the sdX w/ UUIDs in the CR. Putting it in the status: doesn't fix the problem, nor does trying the preserve a sdX==>UUID mapping someplace. It works if nothing will ever be changed in the future (add or remove), but breaks as soon as changes are allowed. Complicating this is that the operator modifying the admin's settings is frowned upon (and a race).

I imagine this is something the local block provisioner team has hit. How are they dealing with it? Personally I'm fine with letting the admin put in whatever device path they want, so long as we provide a stern warning somewhere in our spec docs that they REALLY SHOULD use UUIDs. Especially for throw-away environments, let 'em have /dev/sdX.

So we're talking about splitting the "template" functionality from the "failure domain" specification. Is this really necessary? I feel like it's bringing in added room for misconfiguration.

I think it makes things easier to understand. It also splits the functionality so that those who aren't interested in configuring failure domains don't have to do anything with them. If no zone is specified, just assume all nodeTemplates are part of a single default zone.

My default stance is always against artificially constraining configurations "for the user's benefit". It is a disservice to competent and power users who want to do potentially interesting things with our technology. We always have the authority to say that whatever they're trying is unsupported and/or won't be fixed. :)

Leaving them tied together means you can't put "dev"-backed pods in the same domain as "pv"-backed or external, but does that really matter?

I don't think that mandating they be split is of any meaningful benefit or importance.

@phlogistonjohn brought up different templates for different server configs... if we mandate UUID device specification, there aren't really any templates when using /dev, so they could all go in the same zone. Dev-based zones would require manual creation of Node CRs, so they need not be homogeneous, and the operator wouldn't overwrite them.

I'm not sure I quite understand the meaning of "server templates" here. Is that different from the concept I presented of just have different node templates, each with their (one and only one) management mode/type? Also, as above, I see no concrete reason to prevent users from using dev letter names so long as we make it clear in all our examples and documentation that UUIDs are safer.

JohnStrunk commented 5 years ago

I imagine this is something the local block provisioner team has hit. How are they dealing with it? Personally I'm fine with letting the admin put in whatever device path they want, so long as we provide a stern warning somewhere in our spec docs that they REALLY SHOULD use UUIDs. Especially for throw-away environments, let 'em have /dev/sdX.

They are going to permit specifying either way.

I went looking for a good explanation of the options and found this: https://wiki.archlinux.org/index.php/persistent_block_device_naming After reading that page, I'm not sure there's a good answer that would work in all cases. Further, the UUID I was hoping to use... I have now learned it's actually pulled from the file system, so it's useless for raw block devices. Even items in by-id such as wwn or ata* aren't guaranteed to be there.

I think our only option is to let them specify whatever path they want and warn them that it must be stable.

jarrpa commented 5 years ago

After reading that page, I'm not sure there's a good answer that would work in all cases. Further, the UUID I was hoping to use... I have now learned it's actually pulled from the file system, so it's useless for raw block devices. Even items in by-id such as wwn or ata* aren't guaranteed to be there.

I think our only option is to let them specify whatever path they want and warn them that it must be stable.

Oof... yeah, all right.

Any chance you can get put together an update to the design doc with the things you're willing to incorporate? Or any chance we can collaborate on a branch somewhere so I can propose commits?

JohnStrunk commented 5 years ago

Any chance you can get put together an update to the design doc with the things you're willing to incorporate? Or any chance we can collaborate on a branch somewhere so I can propose commits?

I'm aiming for tomorrow. I need to get this and the csi driver StorageClass definition done for GCS/0.2.

humblec commented 5 years ago

@JohnStrunk late to the party, but started my review. Will push more, however one high level question I have on our CRs are, how our own topology based CRs play with topology based volume provisioning in Kube/OCP ? Isnt it going to confuse admins?

phlogistonjohn commented 5 years ago

Catching up on this discussion after a while of not having much time to look at it, excuse me if I reiterate anything previously discussed. I tried to read every update but I may not have fully absorbed everything.

After reading that page, I'm not sure there's a good answer that would work in all cases. Further, the UUID I was hoping to use... I have now learned it's actually pulled from the file system, so it's useless for raw block devices. Even items in by-id such as wwn or ata* aren't guaranteed to be there.

Yup. I found the same thing while researching options for a request for similar behavior natively baked into Heketi.

I think our only option is to let them specify whatever path they want and warn them that it must be stable.

Since the operator is not the end of the management chain, and we're going to have things like gd2/heketi/etc in the mix my thought is to simply (strongly) recommend persistent devices, but otherwise treat the given devices as a filter. As in "ok gd2/heketi feel free to absorb these devices into your management scheme" and those subsystems should be expected to map whatever currently is /dev/sdX in a persistent manner.

phlogistonjohn commented 5 years ago

Since the operator needs to be able to manage upgrades (potentially a multi-step process walking across versions of server and csi pods), it needs to have a manifest knowing what transitions are legal (and tested). This precludes allowing the admin to specify arbitrary images & tags. The actual info will be a part of the manifest bundled w/ the operator. This is similar to how OLM works.

OK, I'm fairly ignorant ot OLM. I can buy that strategy for most production scenarios but it strikes me as fairly rigid for developing with. I'm fine with leaving it out of the CRD but when it gets time to coding/implementing I hope that there's some mechanism where I can slot in my custom test build and have that get deployed without having to compile the operator from scratch.

JohnStrunk commented 5 years ago

Since the operator is not the end of the management chain, and we're going to have things like gd2/heketi/etc in the mix my thought is to simply (strongly) recommend persistent devices, but otherwise treat the given devices as a filter. As in "ok gd2/heketi feel free to absorb these devices into your management scheme" and those subsystems should be expected to map whatever currently is /dev/sdX in a persistent manner.

I can't figure out how to make this work. Consider:

JohnStrunk commented 5 years ago

Since the operator needs to be able to manage upgrades (potentially a multi-step process walking across versions of server and csi pods), it needs to have a manifest knowing what transitions are legal (and tested). This precludes allowing the admin to specify arbitrary images & tags. The actual info will be a part of the manifest bundled w/ the operator. This is similar to how OLM works.

OK, I'm fairly ignorant ot OLM. I can buy that strategy for most production scenarios but it strikes me as fairly rigid for developing with. I'm fine with leaving it out of the CRD but when it gets time to coding/implementing I hope that there's some mechanism where I can slot in my custom test build and have that get deployed without having to compile the operator from scratch.

My plan here is that there would be a text manifest in the container image. For development, you are free to adjust these however you see fit. A release, however, contains a fixed version of the manifest which points to a very specific set of images that define "version X".

JohnStrunk commented 5 years ago

@JohnStrunk late to the party, but started my review. Will push more, however one high level question I have on our CRs are, how our own topology based CRs play with topology based volume provisioning in Kube/OCP ? Isnt it going to confuse admins?

Unifying would be very cool. Do you have suggestions here?

The topology stuff I've seen seems very cloud provider specific. The current link is that (AWS example)... If you have a multi-AZ cluster, you would want to create 3 "zones" for your Gluster cluster, specifying nodeAffinity such that it would put each zone into a specific AWS AZ, and the StorageClass for that zone's PVs would have their affinities set to grab EBS volumes from the given zone. From my pov, you'd want to name your gluster zone "east-1a" if it's going to be running in AWS AZ "east-1a". I think that would make it fairly straightforward.

JohnStrunk commented 5 years ago

Progress update: We discussed the CRD in the operator meeting this morning.

I'm going to make another pass through the comments, then try and update the real doc.

JohnStrunk commented 5 years ago

I know you thought it'd never happen. :skull: But I managed to update this PR.

JohnStrunk commented 5 years ago

Anyone else?