dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.12k stars 1.4k forks source link

Preferred executor with Computing cluster (Dask) #16013

Open aeroaks opened 1 year ago

aeroaks commented 1 year ago

What's the use case?

We have computation tasks which we execute by using dask. The best approach that I understood from blogs and documentation is to define the cluster as a resource and use in the assets.

With the above tasks, dagster-dask as an executor seems not to be an option. What kind of executor is then generally used in such scenario?

What is the reason a dask cluster cannot be used for both as executor and for parallelising the computation in a task?

Ideas of implementation

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

alangenfeld commented 1 year ago

The current DaskExecutor is quite old, and works by attempting to translate the Dagster DAG in to a Dask DAG and submit it to Dask for orchestration. Working this way means it does not support many Dagster features that are managed at the executor/orchestration layer. This is one reason that the dask executor is not generally recommended.

aeroaks commented 1 year ago

Thanks, Hopefully it helps others looking for it.

So given that I would be using Dask in my compute, and defining Dask as a resource is quite useful in Dagster, which is the preferred executor for this approach? Also, in general, which executor is preferrable?

alangenfeld commented 1 year ago

The default multiprocess/in-process executor is probably the generally preferable choice https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#controlling-job-execution