Open 0yukali0 opened 4 months ago
Thank you for opening your first issue here! 🛠
To address the issue of updating annotations for Ray cluster pods to adopt Yunikorn Gang scheduling, you can follow these steps:
Define RayJobConfig: Use the RayJobConfig
to specify different configurations for the head node and worker nodes, including annotations for Yunikorn Gang scheduling.
Set Annotations: Add the required annotations for the head node and worker nodes to enable Yunikorn Gang scheduling.
Here's an example configuration:
from flytekit import task, Resources
from flytekitplugins.ray import RayJobConfig, HeadNodeConfig, WorkerNodeConfig
ray_config = RayJobConfig(
head_node_config=HeadNodeConfig(
requests=Resources(mem="64Gi", cpu="4"),
limits=Resources(mem="64Gi", cpu="4"),
annotations={
"yunikorn.apache.org/schedulingPolicyParameters": "placeholderTimeoutSeconds=30",
"yunikorn.apache.org/taskGroupName": "taskgroup-raycluster-head",
"yunikorn.apache.org/taskGroup": """
TaskGroups: [
{
name: taskgroup-raycluster-head,
minMember: 1,
minResource: {
cpu: 1,
memory: 500Mi,
},
},
{
name: taskgroup-raycluster-worker-0,
minMember: 5,
minResource: {
cpu: 2,
memory: 1Gi,
},
},
{
name: taskgroup-raycluster-worker-1,
minMember: 5,
minResource: {
cpu: 2,
memory: 1Gi,
},
},
]
"""
}
),
worker_node_config=[
WorkerNodeConfig(
group_name="worker-group-0",
replicas=5,
requests=Resources(mem="1Gi", cpu="2"),
limits=Resources(mem="1Gi", cpu="2"),
annotations={
"yunikorn.apache.org/taskGroupName": "taskgroup-raycluster-worker-0"
}
),
WorkerNodeConfig(
group_name="worker-group-1",
replicas=5,
requests=Resources(mem="1Gi", cpu="2"),
limits=Resources(mem="1Gi", cpu="2"),
annotations={
"yunikorn.apache.org/taskGroupName": "taskgroup-raycluster-worker-1"
}
)
]
)
@task(task_config=ray_config)
def ray_task():
# Task implementation
pass
This configuration sets up the Ray cluster with the necessary annotations for Yunikorn Gang scheduling, distinguishing between the head node and worker nodes.
Would you like more details on any specific part of this configuration?
Sources:
This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.
Mention @runllm to ask follow-up questions. See runllm.com for more!
I am about to finish this issue in these days, if there are any suggestion and aspect, pls let me know. We can make this idea better. Following diagram is the progress i have, i will update this branch to my repo tonight.
Motivation: Why do you think this is important?
Implementing mechanism eases user overhead about updating annotations and is ability to give different task group name for head pod and each work groups to monitor each groups status with yunikorn gang scheduling .
Goal: What should the final outcome look like, ideally?
Annotations of pod belonging head group
Pods of worker group 0
name: taskgroup-raycluster-worker-0
Pods of worker group 1name: taskgroup-raycluster-worker-1
Describe alternatives you've considered
A mechanism updates annotations for pods according to the group they belongs. Available solution to activate yunikorn gang scheduling via defining pod template results in all pods sharing same task group name, resource and parameters annotations. In order to distinguish task groups between head and workers, ray plugin updating annotations including task group name, resource and parameters to a ray pod based on its role when creating ray cluster, When user set "batch-scheduler" with "yunikorn", ray plugin is going to generate separate task group name for each pod, resource of placeholders and parameters. Head pod annotations have task group names, placeholders' resource and parameters. Meanwhile, pods belonging worker group has task group name generated from their worker group.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?