fluid-cloudnative / fluid

Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
https://fluid-cloudnative.github.io/
Apache License 2.0
1.64k stars 956 forks source link

[BUG] Workload Pod scheduling does not satisfy the affinity of Dataset #1729

Open abowloflrf opened 2 years ago

abowloflrf commented 2 years ago

[BUG] Workload Pod scheduling does not satisfy the affinity of Dataset

What is your environment(Kubernetes version, Fluid version, etc.)

Fluid Version: fluid-dataset-controller:v0.7.0-3d66068 Kubernetes Version: v1.20.8

Describe the bug

I'm following the case on the official documents to test the dataset affinity feature.

Nodes with labels:

❯ kubectl get nodes -L test
NAME          STATUS   ROLES    AGE    VERSION   TEST
172.16.0.10   Ready    <none>   99d    v1.20.8   
172.16.0.39   Ready    <none>   68d    v1.20.8   
172.16.0.40   Ready    <none>   68d    v1.20.8   
172.16.0.6    Ready    <none>   109d   v1.20.8   lrf
172.16.80.4   Ready    <none>   80d    v1.20.8  

After creating the dataset with affinity, the Alluxio Worker Pods are created with the same affinity settings and scheduled as expected.

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo2
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: spark
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: test
              operator: In
              values:
                - lrf
---
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
metadata:
  name: demo2
spec:
  replicas: 2
  tieredstore:
    levels:
      - mediumtype: SSD
        path: /tmp/cache
        quota: 2Gi
        high: "0.95"
        low: "0.7"

But then I created a Deployment to mount the generated PVC, the workload Pod was scheduled not as the Dataset affinity setting (nodeSelector test=lrf)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: lrf-nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: lrf-nginx
  template:
    metadata:
      labels:
        app: lrf-nginx
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: app
                    operator: In
                    values:
                      - lrf-nginx
              topologyKey: "kubernetes.io/hostname"
      containers:
        - name: lrf-nginx
          image: nginx
          volumeMounts:
            - mountPath: /data
              name: data
      volumes:
        - persistentVolumeClaim:
            claimName: demo2
          name: data
❯ kubectl get pods -o wide -w
NAME                            READY   STATUS    RESTARTS   AGE   IP            NODE          NOMINATED NODE   READINESS GATES
demo2-fuse-7hfv9                1/1     Running   0          23m   172.16.0.6    172.16.0.6    <none>           <none>
demo2-fuse-lddz9                1/1     Running   0          23m   172.16.80.4   172.16.80.4   <none>           <none>
demo2-master-0                  2/2     Running   0          24m   172.16.80.4   172.16.80.4   <none>           <none>
demo2-worker-0                  2/2     Running   0          23m   172.16.0.6    172.16.0.6    <none>           <none>
demo2-worker-1                  0/2     Pending   0          23m   <none>        <none>        <none>           <none>
lrf-nginx-7f4cbcb4db-mllbs      1/1     Running   0          23m   10.17.0.244   172.16.0.6    <none>           <none>
lrf-nginx-7f4cbcb4db-s8q9q      1/1     Running   0          23m   10.17.70.89   172.16.80.4   <none>           <none>

What you expect to happen:

The workload Pod lrf-nginx-7f4cbcb4db-s8q9q should not be scheduled.

How to reproduce it

Additional Information

The PV created by Fluid is offered below, is this because the PV created by Fluid was not set the affinity correctly?

https://github.com/fluid-cloudnative/fluid/blob/c89748dc79a828f6bffbe26a63f7ae6b59c93e83/pkg/utils/dataset/volume/create.go#L34-L95

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    CreatedBy: fluid
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: "2022-04-02T02:53:02Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    fluid.io/s-default-demo2: "true"
  name: default-demo2
  resourceVersion: "41394905"
  uid: b7196b33-b490-4f96-84d7-9a270dc03034
spec:
  accessModes:
  - ReadOnlyMany
  capacity:
    storage: 100Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: demo2
    namespace: default
    resourceVersion: "41394900"
    uid: 5b8514d4-9909-46a9-9052-80f11e83a617
  csi:
    driver: fuse.csi.fluid.io
    volumeAttributes:
      fluid_path: /runtime-mnt/alluxio/default/demo2/alluxio-fuse
      mount_type: fuse.alluxio-fuse
    volumeHandle: default-demo2
  persistentVolumeReclaimPolicy: Retain
  storageClassName: fluid
  volumeMode: Filesystem
status:
  phase: Bound
TrafalgarZZZ commented 2 years ago

Hi, @abowloflrf . First of all, much thanks for reporting this issue. From my perspective, it's the stale document on the website that provides you with misleading information. Apology for that.

The document you followed is written with Fluid version before v0.6.0, which may have different behaviors when using Fluid v0.7.0. Would you mind trying the document on our Github repo here? It's latest document fit for Fluid v0.7.0.

Besides, if you'd like to schedule workload pod to be co-locate with the data cache in Fluid v0.7.0+, please follow the "Pod调度优化“ docuement.

Again, thanks for reporting this issue. We'll fix the website soon.

abowloflrf commented 2 years ago

Hi, @abowloflrf . First of all, much thanks for reporting this issue. From my perspective, it's the stale document on the website that provides you with misleading information. Apology for that.

The document you followed is written with Fluid version before v0.6.0, which may have different behaviors when using Fluid v0.7.0. Would you mind trying the document on our Github repo here? It's latest document fit for Fluid v0.7.0.

Besides, if you'd like to schedule workload pod to be co-locate with the data cache in Fluid v0.7.0+, please follow the "Pod调度优化“ docuement.

Again, thanks for reporting this issue. We'll fix the website soon.

@TrafalgarZZZ That resolves my confusion. Thanks for replying!

I'll keep this issue open until the website is updated.