[epic] API Design for EtcdCluster resource

kvaps commented 7 months ago

We are going to release MVP (v0.1.0) and we need a stable spec. Here is the corrected version of the spec from this proposal: https://github.com/aenix-io/etcd-operator/issues/62 (original author: @sergeyshevch).

I'm going to use this meta-issue to link all parts for implementing this:

---
apiVersion: etcd.aenix.io/v1alpha1
kind: EtcdCluster
metadata:
  name: test
  namespace: default
spec:
  replicas: 3
  options: # map[string]string
    election-timeout: "1000"
    max-wals: "5"
    max-snapshots: "5"

  storage:
    emptyDir: {} # core.v1.EmptyDirVolumeSource Ready k8s type
    volumeClaimTemplate:
      metadata:
        labels:
          env: prod
        annotations:
          example.com/annotation: "true"
      spec: # core.v1.PersistentVolumeClaimSpec Ready k8s type
        storageClassName: gp3
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 10Gi

  podTemplate:
    metadata:
      labels:
        env: prod
      annotations:
        example.com/annotation: "true"
    spec:
      imagePullSecrets:  # core.v1.LocalObjectReference Ready k8s type
      - name: myregistrykey
      serviceAccountName: default
      affinity: {} # core.v1.Affinity Ready k8s type
      nodeSelector: {} # map[string]string
      tolerations: [] # core.v1.Toleration Ready k8s type
      securityContext: {} # core.v1.PodSecurityContext Ready k8s type
      priorityClassName: "low"
      topologySpreadConstraints: [] # core.v1.TopologySpreadConstraint Ready k8s type
      terminationGracePeriodSeconds: 30 # int64
      schedulerName: "default-scheduler"
      runtimeClassName: "legacy"
      readinessGates: [] # core.v1.PodReadinessGate Ready k8s type
      containers: # []v1.Container
      - name: etcd
        image: "quay.io/coreos/etcd:v3.5.12"
        imagePullPolicy: Always
        resources: # core.v1.ResourceRequirements Ready k8s type
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 200m
            memory: 200Mi
      volumes: [] # []v1.Volume

  serviceTemplate:
    metadata:
      labels:
        env: prod
      annotations:
        example.com/annotation: "true"
    spec: # core.v1.ServiceSpec Ready k8s type

  podDisruptionBudgetTemplate:
    metadata:
      labels:
        env: prod
      annotations:
        example.com/annotation: "true"
    spec:
      maxUnavailable: 1 # intstr.IntOrString
      minAvailable: 2

 status:
   conditions:
   - lastProbeTime: null
     lastTransitionTime: "2024-03-06T18:39:45Z"
     status: "True"
     type: Ready
   - lastProbeTime: null
     lastTransitionTime: "2024-03-06T18:39:45Z"
     status: "True"
     type: Initialized

sergeyshevch commented 7 months ago

I don't like such idea as well as podSpec field.

I guess simple implementation with flat list of parameters inside spec will be simpler for understanding and usage.

Some good examples in my opinion:

Proposed design closely match to kubernetes spec but looks like a lot of operator prefer other, more simpler approach

I suggest to build few usecases like:

Simple cluster with few replicas
Cluster with custom image
Cluster with persistent storage
Cluster with resources
Cluster with all field specified

Then we can look on both approaches and compare more correctly

sergeyshevch commented 7 months ago

Also I guess we need to be aware of landscape around kubernetes. Let's imagine this usecase:

I deploy applications in my org with some common chart
I want to add EtcdCluster into such chart
I cannot just map some default container resources from values as is. I need to totally reimplement CR spec in my chart because helm doesn't support array merging.
I already met with such case in other operators and it was painfull to support

kvaps commented 7 months ago

I guess simple implementation with flat list of parameters inside spec will be simpler for understanding and usage.

It makes sense, I just don't like the idea if parameters will repeat themself. It's better to have spec extendable but without sugar, then sugar without opportunity to extend :)

Proposed design closely match to kubernetes spec but looks like a lot of operator prefer other, more simpler approach

I would also mention ElasticSearch operator and piraeus-operator v2 (second major version, which reworked by mistakes from v1), both are using podTemplate to define default fields for pods:

I suggest to build few usecases like:

Simple cluster with few replicas

Cluster with custom image

Cluster with persistent storage

Cluster with resources

Cluster with all field specified

Then we can look on both approaches and compare more correctly

Great Idea, I will prepare a PR such examples and we can move discussion into review

lllamnyp commented 7 months ago

I'm happy with the spec as of the time of this comment.

relyt0925 commented 7 months ago

One thing I would like to note: A number of folks who I have talked to who run etcd clusters run them purely on nodes with local disk storage. Note this storage can be removed when nodes are being updated (and therefore reinitialized back to an initial empty directory). Currently: I see that done as etcd pods with an emptydir running in a clustered mode.

The operator will handle looking at the state of the cluster (etcd pods). If a pod dies or is removed: the operator will handle calling into the etcd cluster to remove the member, and then recalling back into the cluster to create the member and then also scheduling the pod associated with the member. Effectively: this allows pods to be able to move nodes/tolerate failures and automatically recover.

In this mode currently in the spec: I see we are using statefulsets only: I don't think using a stateful set necessarily precludes this mode: but we need to be able to use local volumes. Additionally: the etcd controller has to then be smart enough to know when a node that an etcd pod/volume was scheduled to has been fully removed, and then handle recreating the pv/pvc for the local volume to allow it to go on a different node.

More of a question: although not implemented in the controller backend yet (which I think can be extended) does the design support that use case?

relyt0925 commented 7 months ago

Additionally: is there a mode where users can specify their own pki infrastructure (CA, certs, secrets) for the etcd cluster?

kvaps commented 7 months ago

is there a mode where users can specify their own pki infrastructure (CA, certs, secrets) for the etcd cluster?

Yeah, right now there is a suggestion from @lllamnyp to add additional security section for that, but not in this iteration:

spec:
  security:
    serverTLSSecretRef: # secretRef
      name: server-tls-secret
    clientCertAuth: true # bool
    trustedCAFile: # secretRef
      name: trusted-tls-secret # Client certificates may be signed by a different CA than server certificates
    peerTLSSecretRef: # secretRef
      name: peer-tls-secret # However, the CA for intra-cluster certificates is the same for both incoming and outgoing requests
    peerClientCertAuth: true # bool

implementation design discussed here https://github.com/aenix-io/etcd-operator/pull/87

Kirill-Garbar commented 7 months ago

For now the most amount of information regarding auth and security topic is described in this issue https://github.com/aenix-io/etcd-operator/issues/76. The PR that is referred in the previous comment will include the results of discussion and implementation.

aenix-io / etcd-operator

[epic] API Design for EtcdCluster resource #109