Cannot set tolerations/nodeSelector for temporary arangodb-cluster-id pod

ddelange commented 1 year ago

Hi! :wave:

I'm trying to deploy an ArangoDeployment to a mixed amd/arm cluster, where the arm nodes have a no schedule taint.

When I boot a fresh cluster with spec.architecture: [arm64], there will be a temporary arangodb-cluster-id-xxx pod created, which has hard NodeAffinity for arm64, but no way to add tolerations/nodeSelector. This means the pod won't be scheduled on a node, and the cluster boot sequence will hang.

When I change to spec.architecture: [amd64, arm64], the temporary pod will be allowed to schedule on an amd node in the cluster, ~and the rest of the pods will have the toleration for arm and so will schedule on the arm nodes~ (as we also have a prefer no schedule taint on the amd nodes). This is an acceptable workaround for now, but I'd rather specify spec.architecture: [arm64] and get a successful boot.

Deleted the ArangoDeployment from above (but with arm64 in the spec), and now trying to re-create it.

The operator creates the following id Pod, with arm64 nodeAffinity, but without arm64 tolerations. We have tolerations in the ArangoDeployment for controller,agent,prmr -- because we have a mixed amd64/arm64 cluster, and all arm64 nodes are tainted such that they are opt-in. Now, because of the hard nodeAffinity but missing tolerations on the id Pod, our ArangoDeployment won't boot at all 😅

arangodb-cluster-id-162f0e

```yaml apiVersion: v1 kind: Pod metadata: creationTimestamp: "2022-09-23T07:48:45Z" labels: app: arangodb arango_deployment: arangodb-cluster deployment.arangodb.com/member: 162f0e role: id managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:labels: .: {} f:app: {} f:arango_deployment: {} f:deployment.arangodb.com/member: {} f:role: {} f:ownerReferences: .: {} k:{"uid":"6054f633-c606-4f75-b7ee-a28430e1b952"}: {} f:spec: f:affinity: .: {} f:nodeAffinity: .: {} f:requiredDuringSchedulingIgnoredDuringExecution: {} f:podAntiAffinity: .: {} f:preferredDuringSchedulingIgnoredDuringExecution: {} f:containers: k:{"name":"server"}: .: {} f:command: {} f:image: {} f:imagePullPolicy: {} f:name: {} f:ports: .: {} k:{"containerPort":8529,"protocol":"TCP"}: .: {} f:containerPort: {} f:name: {} f:protocol: {} f:resources: {} f:securityContext: .: {} f:capabilities: .: {} f:drop: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/data"}: .: {} f:mountPath: {} f:name: {} f:dnsPolicy: {} f:enableServiceLinks: {} f:hostname: {} f:restartPolicy: {} f:schedulerName: {} f:securityContext: {} f:subdomain: {} f:terminationGracePeriodSeconds: {} f:tolerations: {} f:volumes: .: {} k:{"name":"arangod-data"}: .: {} f:emptyDir: {} f:name: {} manager: arangodb_operator operation: Update time: "2022-09-23T07:48:45Z" - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:status: f:conditions: .: {} k:{"type":"PodScheduled"}: .: {} f:lastProbeTime: {} f:lastTransitionTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} manager: kube-scheduler operation: Update subresource: status time: "2022-09-23T07:48:45Z" name: arangodb-cluster-id-162f0e namespace: aa-data-api ownerReferences: - apiVersion: database.arangodb.com/v1 controller: true kind: ArangoDeployment name: arangodb-cluster uid: 6054f633-c606-4f75-b7ee-a28430e1b952 resourceVersion: "109818868" uid: cf969e3d-6bc1-4395-a774-070b6b629c1e spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/arch operator: In values: - arm64 podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchLabels: app: arangodb arango_deployment: arangodb-cluster role: id topologyKey: kubernetes.io/hostname weight: 1 containers: - command: - /usr/sbin/arangod - --server.authentication=false - --server.endpoint=tcp://[::]:8529 - --database.directory=/data - --log.output=+ image: arangodb/arangodb-preview:3.10.0-beta.1 imagePullPolicy: IfNotPresent name: server ports: - containerPort: 8529 name: server protocol: TCP resources: {} securityContext: capabilities: drop: - ALL terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /data name: arangod-data - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-mzlkl readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true hostname: arangodb-cluster-id-162f0e preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: Never schedulerName: default-scheduler securityContext: {} serviceAccount: default serviceAccountName: default subdomain: arangodb-cluster-int terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 5 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 5 - effect: NoExecute key: node.alpha.kubernetes.io/unreachable operator: Exists tolerationSeconds: 5 volumes: - emptyDir: {} name: arangod-data - name: kube-api-access-mzlkl projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace status: conditions: - lastProbeTime: null lastTransitionTime: "2022-09-23T07:48:45Z" message: '0/22 nodes are available: 17 node(s) had taint {arch: arm64}, that the pod didn''t tolerate, 2 node(s) didn''t match Pod''s node affinity/selector, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn''t tolerate.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending qosClass: BestEffort ```

Originally posted by @ddelange in https://github.com/arangodb/arangodb-docker/issues/53#issuecomment-1255914415

ddelange commented 1 year ago

correction: the cluster boots succesfully with my dual architecture setup, but i think that's also a bug:

with that setup, the prmr pods will end up with only one nodeaffinity, and so there will still be no pods going onto the arm nodes:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - amd64

ddelange commented 1 year ago

when i switch the order in spec.architecture, we're back at the original issue, where the temporary pod doesn't schedule. it seems that only the first entry in spec.architecture is respected:

0/13 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 8 node(s) had taint {arch: arm64}, that the pod didn't tolerate.

ddelange commented 1 year ago

fwiw in the original implementation there was an error "Only one architecture type is supported currently", but that seems to have disappeared before/during/after merging that PR

ddelange commented 1 year ago

so I now hacked our toleration

  - effect: NoSchedule
    key: arch
    operator: Equal
    value: arm64

into the spec of the temporary pod and the cluster now successfully booted on arm

ddelange commented 1 year ago

when i switch the order in spec.architecture, we're back at the original issue, where the temporary pod doesn't schedule. it seems that only the first entry in spec.architecture is respected:
0/13 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) were unschedulable, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 8 node(s) had taint {arch: arm64}, that the pod didn't tolerate.

opened https://github.com/arangodb/kube-arangodb/issues/1140

ddelange commented 1 year ago

cc @jwierzbo from https://github.com/arangodb/arangodb-docker/issues/53#issuecomment-1256284798

I guess this issue should live in kube-arangodb actually 😅 should I close and re-open there?

dothebart commented 1 year ago

yes, I think as long as our docker container doesn't anything wrong about this topic, this issue doesn't belong here, and it should be discussed inside of the http://github.com/arangodb/kube-arangodb repo. The multiarch support of the container should be working properly as of ArangoDB 3.10, right?

ddelange commented 1 year ago

yes, all green until now! I'll re-open there

ddelange commented 1 year ago

closing in favor of https://github.com/arangodb/kube-arangodb/issues/1141

arangodb / arangodb-docker

Cannot set tolerations/nodeSelector for temporary arangodb-cluster-id pod #110