flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.62k stars 620 forks source link

[Core feature] Updating annotations of pods belonging Ray cluster in order to adopting Yunikorn Gang scheduling #5575

Open 0yukali0 opened 2 months ago

0yukali0 commented 2 months ago

Motivation: Why do you think this is important?

Implementing mechanism eases user overhead about updating annotations and is ability to give different task group name for head pod and each work groups to monitor each groups status with yunikorn gang scheduling .

Goal: What should the final outcome look like, ideally?

Annotations of pod belonging head group

yunikorn.apache.org/schedulingPolicyParameters: “placeholderTimeoutSeconds=30”
yunikorn.apache.org/taskGroupName: “taskgroup-raycluster-head”
yunikorn.apache.org/taskGroup: “
    TaskGroups: [
      {
        name: taskgroup-raycluster-head
        minMember: 1
        minResource: {
          cpu: 1,
          memory: 500Mi,
        },
        nodeSelector: ...,
        tolerations: ...,
        affinity: ...,
      },
      {
        name: taskgroup-raycluster-worker-0
        minMember: 5
        minResource: {
          cpu: 2,
          memory: 1Gi,
        },
        {
        name: taskgroup-raycluster-worker-1
        minMember: 5
        minResource: {
          cpu: 2,
          memory: 1Gi,
        },
      },
    ]
    "

Pods of worker group 0 name: taskgroup-raycluster-worker-0 Pods of worker group 1 name: taskgroup-raycluster-worker-1

Describe alternatives you've considered

A mechanism updates annotations for pods according to the group they belongs. Available solution to activate yunikorn gang scheduling via defining pod template results in all pods sharing same task group name, resource and parameters annotations. In order to distinguish task groups between head and workers, ray plugin updating annotations including task group name, resource and parameters to a ray pod based on its role when creating ray cluster, When user set "batch-scheduler" with "yunikorn", ray plugin is going to generate separate task group name for each pod, resource of placeholders and parameters. Head pod annotations have task group names, placeholders' resource and parameters. Meanwhile, pods belonging worker group has task group name generated from their worker group.

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome[bot] commented 2 months ago

Thank you for opening your first issue here! 🛠

runllm[bot] commented 2 months ago

To address the issue of updating annotations for Ray cluster pods to adopt Yunikorn Gang scheduling, you can follow these steps:

  1. Define RayJobConfig: Use the RayJobConfig to specify different configurations for the head node and worker nodes, including annotations for Yunikorn Gang scheduling.

  2. Set Annotations: Add the required annotations for the head node and worker nodes to enable Yunikorn Gang scheduling.

Here's an example configuration:

from flytekit import task, Resources
from flytekitplugins.ray import RayJobConfig, HeadNodeConfig, WorkerNodeConfig

ray_config = RayJobConfig(
    head_node_config=HeadNodeConfig(
        requests=Resources(mem="64Gi", cpu="4"),
        limits=Resources(mem="64Gi", cpu="4"),
        annotations={
            "yunikorn.apache.org/schedulingPolicyParameters": "placeholderTimeoutSeconds=30",
            "yunikorn.apache.org/taskGroupName": "taskgroup-raycluster-head",
            "yunikorn.apache.org/taskGroup": """
                TaskGroups: [
                  {
                    name: taskgroup-raycluster-head,
                    minMember: 1,
                    minResource: {
                      cpu: 1,
                      memory: 500Mi,
                    },
                  },
                  {
                    name: taskgroup-raycluster-worker-0,
                    minMember: 5,
                    minResource: {
                      cpu: 2,
                      memory: 1Gi,
                    },
                  },
                  {
                    name: taskgroup-raycluster-worker-1,
                    minMember: 5,
                    minResource: {
                      cpu: 2,
                      memory: 1Gi,
                    },
                  },
                ]
            """
        }
    ),
    worker_node_config=[
        WorkerNodeConfig(
            group_name="worker-group-0",
            replicas=5,
            requests=Resources(mem="1Gi", cpu="2"),
            limits=Resources(mem="1Gi", cpu="2"),
            annotations={
                "yunikorn.apache.org/taskGroupName": "taskgroup-raycluster-worker-0"
            }
        ),
        WorkerNodeConfig(
            group_name="worker-group-1",
            replicas=5,
            requests=Resources(mem="1Gi", cpu="2"),
            limits=Resources(mem="1Gi", cpu="2"),
            annotations={
                "yunikorn.apache.org/taskGroupName": "taskgroup-raycluster-worker-1"
            }
        )
    ]
)

@task(task_config=ray_config)
def ray_task():
    # Task implementation
    pass

This configuration sets up the Ray cluster with the necessary annotations for Yunikorn Gang scheduling, distinguishing between the head node and worker nodes.

Would you like more details on any specific part of this configuration?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

0yukali0 commented 2 months ago

I am about to finish this issue in these days, if there are any suggestion and aspect, pls let me know. We can make this idea better. Following diagram is the progress i have, i will update this branch to my repo tonight. image