dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
10.63k stars 1.32k forks source link

Using Ray distributed computing as an Executor #18213

Open aeroaks opened 7 months ago

aeroaks commented 7 months ago

What's the use case?

We use Ray in our setup to distribute computing. Having seen a Dagster-dask Executor. I was wondering what and how can we support a cluster, like Ray, as an executor.

Ideas of implementation

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

mkleinbort commented 2 months ago

Hey, keen to see a ray executor. I came across this project https://github.com/danielgafni/dagster-ray but it looks to be in its very early stages.

Any plans or guidance to support ray executors?

garethbrickman commented 2 months ago

cc @danielgafni for your awareness!

danielgafni commented 2 months ago

Hey!

This is definitely something I would like to finish.

I've written and have been using a RayClusterResource (backed by KubeRay) in production with great success.

I did not do an Executor, just with a Resource.

I'm planning to open-source it. I do have more free time right now for this project.

A Ray executor/resource could be backed by a lot of different compute environments. It can be something general like Kubernetes, or more cloud-specific service like AWS EC2.

I think we should start with Kubernetes as it would benefit a wide range of users and would also be easier to test. Personally I'm not interested in implementing other backends (apart from the trivial local one), but it might make sense to add them in the future.

danielgafni commented 2 months ago

Hey everyone!

I added a KubeRayCluster resource to dagster-ray.

While it's not an Executor, it also allows executing Ray code in an auto-provisioned RayCluster on Kubernetes.

It works in minikube, but I would like to test it with a real cluster. Currently I don't have one.

Please reach out to me if you want to try it out.

Should be as easy as:

  1. Have KubeRay Operator installed in k8s cluster
  2. Configure the Resource according to your needs (e.g. max_nodes, pod resources)
  3. Request the resource in an asset:
@asset
def my_asset(ray_cluster: KubeRayCluster): 
    # write ray code here, at this point you are already connected to the RayCluster