dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.64k stars 1.47k forks source link

Allow passing `executor` to `asset`, `graph_asset` and `graph_multi_asset` #14539

Open danielgafni opened 1 year ago

danielgafni commented 1 year ago

What's the use case?

Currently the only way to use a K8s executor with assets is to set the default Definitions(executor=...). It would be much better if it could be set on the asset level too (just like with jobs). Am I missing something?

Ideas of implementation

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

sryza commented 1 year ago

Currently, we have a system constraint that each run has exactly one executor. We currently allow launching a run to materialize a subset of assets within a code location. If different assets within a code location could have different executors, it would create situations where it would be possible to submit runs that require different executors.

So to implement this, we'd need to have an answer to that problem, either by allowing runs to have multiple executors (this would be a big change) or restricting runs within a code location to assets with the same executor.

ion-elgreco commented 9 months ago

@sryza can you at least make it possible to run a set of steps (ops) together? Because that would allow you to pass the data between the Ops in memory instead of having to persist..

danielgafni commented 9 months ago

@ion-elgreco I'm pretty sure the default Executor already allows that

https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#default-job-executor

Edit: realized you want to arbitrary (or maybe constrained by a graph) run sets of ops together

derHeinzer commented 2 months ago

Hi @danielgafni,

Are there any plans to implement this feature?

In my case, most of the assets in my code location don't require the k8s_job_executor and would work best with the default executor. However, there are a few assets where I need to configure k8s_konfig (such as setting resource requests and node pool selectors).

Currently, I can only achieve this in two ways:

Setting the k8s_job_executor as the default executor, which introduces unnecessary overhead for most asset materializations. Explicitly setting the k8s_job_executor for specific assets, which has its own downsides. The problem with the second approach is that when materializing those assets ad-hoc, the default executor is used instead of the k8s_job_executor. It's easy to forget to allways use the respective job instead of just hitting "materialize", especially in a larger team. Moreover, jobs that materialize assets using different executors need to be separated, which adds complexity.

I would appreciate any insights or plans on addressing this!

Thanks.

danielgafni commented 1 month ago

Hey @derHeinzer!

The concerns you raised are completely valid. As a workaround, you could split your definitions into 2 code locations, one of them having a non-default executor set at Definitions level.

I don't know if we currently have any plans for addressing this problem. Let's ping @schrockn to see if he has a better answer.

schrockn commented 1 month ago

The most natural way to do specialize the executor below the the Definitions level is to specific the executor in define_asset_job, since we have the constraint of a single executor per run.