dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.18k stars 1.4k forks source link

Multiprocessing for asset jobs while executing substituent graph-backed assets in-process #11101

Open erinov1 opened 1 year ago

erinov1 commented 1 year ago

What's the use case?

It can be useful to build complex assets from a graph of composable ops. However, if these are very lightweight you may not want to spawn a new subprocess to execute each op. On the other hand, there may be other assets in the given asset job that should be executed in their own process.

For example: an asset job JOB consisting of two assets ASSET_1, ASSET_2, where ASSET_1 is graph-backed, built from two ops OP_A, OP_B. I'd like to be be able to launch a run of JOB that executes ASSET_1 and ASSET_2 in their own processes, without spawning processes for OP_A and OP_B (in other words, OP_A and OP_B run in the process spawned for ASSET_1, and handle their I/O via an IOManager).

Unless I'm mistaken that does not currently seem possible?

Ideas of implementation

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

sryza commented 1 year ago

@erinov1 you're right that this is not currently possible. It would be a cool thing to add. Likely a large undertaking, because it would mean going from 1 executor within a run to multiple executors within a run.