admiraltyio / admiralty

A system of Kubernetes controllers that intelligently schedules workloads across clusters.
https://admiralty.io
Apache License 2.0
673 stars 87 forks source link

Wrong PODs assignment #201

Open rostrzycki-7bc opened 10 months ago

rostrzycki-7bc commented 10 months ago

I created 3 local clusters - as it was described on the Quick start page.

I tried to test very simple "cloud bursting" scenario on these clusters - as it was suggested in the Cloud Bursting section in the documentation.

Then I marked "default" namesmace as multicluster schedulable: kubectl --context kind-cd label ns default multicluster-scheduler=enabled and I deployed such deployment on the kind-cd cluster:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-test
  labels:
    app: hello-test
spec:
  replicas: 5
  selector:
    matchLabels:
      app: hello-test
  template:
    metadata:
      labels:
        app: hello-test
      annotations:
        multicluster.admiralty.io/elect: ""
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 50
            preference:
              matchExpressions:
              - key: topology.kubernetes.io/region
                operator: In
                values: [eu]
      containers:
      - image: nginxdemos/hello:plain-text
        name: hello-test
        resources:
          requests:
            cpu: 25m
            memory: 64Mi
          limits:
            cpu: 25m
            memory: 64Mi

I expected that PODs created according to this deployment, should have been placed on the "eu" node in the kind-eu cluster. But they didn't. They were spread across 2 clusters ("us" and "eu"). I have to add that nodes were properly labeled., so I assume there is a bug in the Admiralty scheduler.

I also reproduced this "cloud bursting" scenario on Minikube with 2 local clusters. The result was the same: preferredDuringSchedulingIgnoredDuringExecution is not taken into account.

adrienjt commented 8 months ago

The problem here is that the preferred affinity is used as a constraint by the candidate schedulers in the target clusters. Since it's preferred, not required, both clusters tolerate the constraint, and the (unconstrained) proxy scheduler in the source cluster spreads by default.

You need to use the preferred affinity as a proxy scheduler constraint, using either the multicluster.admiralty.io/use-constraints-from-spec-for-proxy-pod-scheduling: "" pod annotation (if you don't need candidate scheduler constraints) or the multicluster.admiralty.io/proxy-pod-scheduling-constraints: <PodSpec yaml> pod annotation (if you need constraints at both levels).

In practice though, if one cluster is elastic and the other one isn't, no need for a preferred affinity. Admiralty will send two candidate pods until one is scheduled, cancelling any scale-up in the elastic cluster if there's capacity in the inelastic cluster.