crate / crate-operator

The CrateDB Kubernetes Operator provides a convenient way to run CrateDB clusters inside Kubernetes.
https://crate.io
Apache License 2.0
24 stars 7 forks source link

TopologySpreadConstraints: add minDomains=3 #645

Closed goat-ssh closed 2 months ago

goat-ssh commented 3 months ago

Summary of changes

This help Cluster Autoscaler to provision nodes in zones which are set to 0 nodes.

Require at least 3 nodes (as we have 3 zones)

https://kubernetes.io/blog/2023/04/17/fine-grained-pod-topology-spread-features-beta/#kep-3022-min-domains-in-pod-topology-spread

Tested with https://gist.github.com/goat-ssh/612dc0e2931fd1889428cf18fb71b268 by scaling it to 3+ replicas afterwards

I've also upgraded the k8s client to latest and fixed a test.

Checklist

goat-ssh commented 3 months ago

We have failing tests: https://jenkins.crate.io/job/cloud/job/crate-operator/job/nightly-operator-integration-test/1421/#showFailuresLink

goat-ssh commented 3 months ago

Some catched events from Azure

kind: Event
apiVersion: v1
metadata:
  name: kopf-event-55nnt
  generateName: kopf-event-
  namespace: 925dcca4-0ced-4157-a51d-843ec1bdcf56
  uid: b1b8ee86-4737-48bd-a016-3a2ba5516cee
  resourceVersion: '5932'
  creationTimestamp: '2024-08-23T09:31:59Z'
  managedFields:
    - manager: kopf
      operation: Update
      apiVersion: v1
      time: '2024-08-23T09:31:59Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:action: {}
        f:eventTime: {}
        f:firstTimestamp: {}
        f:involvedObject: {}
        f:lastTimestamp: {}
        f:message: {}
        f:metadata:
          f:generateName: {}
        f:reason: {}
        f:reportingComponent: {}
        f:reportingInstance: {}
        f:source:
          f:component: {}
        f:type: {}
involvedObject:
  kind: CrateDB
  namespace: 925dcca4-0ced-4157-a51d-843ec1bdcf56
  name: acosta
  uid: 3392822b-a00a-402f-bac0-625c7bdefbe3
  apiVersion: cloud.crate.io/v1
reason: Logging
message: >+
  Handler 'cluster_create/bootstrap' failed with an exception. Will retry.

  Traceback (most recent call last):
    File "/var/lib/jenkins/workspace/cloud/crate-operator/nightly-operator-integration-test/env/lib/python3.8/site-packages/kopf/_core/actions/execution.py", line 279, in execute_handler_once
      result = await invoke_handler(
    File "/var/lib/jenkins/workspace/cloud/crate-operator/nightly-operator-integration-test/env/lib/python3.8/site-packages/kopf/_core/actions/execution.py", line 374, in invoke_ha...': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'acb3ce89-aebf-44cd-a52f-5022b5213ef9', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'c256a070-3bb7-497e-9548-845e4156a57a', 'Date': 'Fri, 23 Aug 2024 09:31:59 GMT', 'Content-Length': '211')>
  HTTP response body:
  {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"error
  decoding patch: json: cannot unmarshal object into Go value of type
  []handlers.jsonPatchOp","reason":"BadRequest","code":400}

source:
  component: kopf
firstTimestamp: '2024-08-23T09:31:59Z'
lastTimestamp: '2024-08-23T09:31:59Z'
type: Error
eventTime: '2024-08-23T09:31:59.329355Z'
action: Action?
reportingComponent: kopf
reportingInstance: dev
goat-ssh commented 3 months ago

Adding full logs from Azure events: cloud-k8s-HNbmKC6f_Events_2024-08-23T09_40_21.785Z.json

goat-ssh commented 2 months ago

Closing in favor of https://github.com/crate/crate-operator/pull/653