Cannot set tolerations/nodeSelector for temporary arangodb-cluster-id pod #110

Closed ddelange closed 1 year ago

ddelange commented 1 year ago

Hi! :wave:

I'm trying to deploy an ArangoDeployment to a mixed amd/arm cluster, where the arm nodes have a no schedule taint.

When I boot a fresh cluster with spec.architecture: [arm64], there will be a temporary arangodb-cluster-id-xxx pod created, which has hard NodeAffinity for arm64, but no way to add tolerations/nodeSelector. This means the pod won't be scheduled on a node, and the cluster boot sequence will hang.

When I change to spec.architecture: [amd64, arm64], the temporary pod will be allowed to schedule on an amd node in the cluster, ~and the rest of the pods will have the toleration for arm and so will schedule on the arm nodes~ (as we also have a prefer no schedule taint on the amd nodes). This is an acceptable workaround for now, but I'd rather specify spec.architecture: [arm64] and get a successful boot.

Deleted the ArangoDeployment from above (but with arm64 in the spec), and now trying to re-create it.

The operator creates the following id Pod, with arm64 nodeAffinity, but without arm64 tolerations. We have tolerations in the ArangoDeployment for controller,agent,prmr -- because we have a mixed amd64/arm64 cluster, and all arm64 nodes are tainted such that they are opt-in. Now, because of the hard nodeAffinity but missing tolerations on the id Pod, our ArangoDeployment won't boot at all 😅


```yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: "2022-09-23T07:48:45Z" labels: app: arangodb arango_deployment: arangodb-cluster 162f0e role: id managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:labels: .: {} f:app: {} f:arango_deployment: {} {} f:role: {} f:ownerReferences: .: {} k:{"uid":"6054f633-c606-4f75-b7ee-a28430e1b952"}: {} f:spec: f:affinity: .: {} f:nodeAffinity: .: {} f:requiredDuringSchedulingIgnoredDuringExecution: {} f:podAntiAffinity: .: {} f:preferredDuringSchedulingIgnoredDuringExecution: {} f:containers: k:{"name":"server"}: .: {} f:command: {} f:image: {} f:imagePullPolicy: {} f:name: {} f:ports: .: {} k:{"containerPort":8529,"protocol":"TCP"}: .: {} f:containerPort: {} f:name: {} f:protocol: {} f:resources: {} f:securityContext: .: {} f:capabilities: .: {} f:drop: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/data"}: .: {} f:mountPath: {} f:name: {} f:dnsPolicy: {} f:enableServiceLinks: {} f:hostname: {} f:restartPolicy: {} f:schedulerName: {} f:securityContext: {} f:subdomain: {} f:terminationGracePeriodSeconds: {} f:tolerations: {} f:volumes: .: {} k:{"name":"arangod-data"}: .: {} f:emptyDir: {} f:name: {} manager: arangodb_operator operation: Update time: "2022-09-23T07:48:45Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:status: f:conditions: .: {} k:{"type":"PodScheduled"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} manager: kube-scheduler operation: Update subresource: status time: "2022-09-23T07:48:45Z" name: arangodb-cluster-id-162f0e namespace: aa-data-api ownerReferences: - apiVersion: controller: true kind: ArangoDeployment name: arangodb-cluster uid: 6054f633-c606-4f75-b7ee-a28430e1b952 resourceVersion: "109818868" uid: cf969e3d-6bc1-4395-a774-070b6b629c1e spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: operator: In values: - arm64 podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchLabels: app: arangodb arango_deployment: arangodb-cluster role: id topologyKey: weight: 1 containers: - command: - /usr/sbin/arangod - --server.authentication=false - --server.endpoint=tcp://[::]:8529 - - --log.output=+ image: arangodb/arangodb-preview:3.10.0-beta.1 imagePullPolicy: IfNotPresent name: server ports: - containerPort: 8529 name: server protocol: TCP resources: {} securityContext: capabilities: drop: - ALL terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /data name: arangod-data - mountPath: /var/run/secrets/ name: kube-api-access-mzlkl readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true hostname: arangodb-cluster-id-162f0e preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: Never schedulerName: default-scheduler securityContext: {} serviceAccount: default serviceAccountName: default subdomain: arangodb-cluster-int terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: operator: Exists tolerationSeconds: 5 - effect: NoExecute key: operator: Exists tolerationSeconds: 5 - effect: NoExecute key: operator: Exists tolerationSeconds: 5 volumes: - emptyDir: {} name: arangod-data - name: kube-api-access-mzlkl projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace status: conditions: - lastProbeTime: null lastTransitionTime: "2022-09-23T07:48:45Z" message: '0/22 nodes are available: 17 node(s) had taint {arch: arm64}, that the pod didn''t tolerate, 2 node(s) didn''t match Pod''s node affinity/selector, 3 node(s) had taint { }, that the pod didn''t tolerate.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending qosClass: BestEffort ```

Originally posted by @ddelange in

ddelange commented 1 year ago

correction: the cluster boots succesfully with my dual architecture setup, but i think that's also a bug:

with that setup, the prmr pods will end up with only one nodeaffinity, and so there will still be no pods going onto the arm nodes:

        - matchExpressions:
          - key:
            operator: In
            - amd64
ddelange commented 1 year ago

when i switch the order in spec.architecture, we're back at the original issue, where the temporary pod doesn't schedule. it seems that only the first entry in spec.architecture is respected:

0/13 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) were unschedulable, 3 node(s) had taint { }, that the pod didn't tolerate, 8 node(s) had taint {arch: arm64}, that the pod didn't tolerate.
ddelange commented 1 year ago

fwiw in the original implementation there was an error "Only one architecture type is supported currently", but that seems to have disappeared before/during/after merging that PR

ddelange commented 1 year ago

so I now hacked our toleration

  - effect: NoSchedule
    key: arch
    operator: Equal
    value: arm64

into the spec of the temporary pod and the cluster now successfully booted on arm

ddelange commented 1 year ago

ddelange commented 1 year ago

cc @jwierzbo from

I guess this issue should live in kube-arangodb actually 😅 should I close and re-open there?

dothebart commented 1 year ago

yes, I think as long as our docker container doesn't anything wrong about this topic, this issue doesn't belong here, and it should be discussed inside of the repo. The multiarch support of the container should be working properly as of ArangoDB 3.10, right?

ddelange commented 1 year ago

yes, all green until now! I'll re-open there

ddelange commented 1 year ago

closing in favor of