dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.17k stars 1.4k forks source link

Dagster integration for Apache Beam profiling #2429

Open terekete opened 4 years ago

terekete commented 4 years ago

hi,

Looking to see if there are any integrations on using dagster to data profile something like apache beam / dataflow jobs? Is there a way to integrate dagster in a multistep pipeline ?

Thanks, M.

natekupp commented 4 years ago

Hi @terekete apologies for the long delay here—we don't have anything for beam/dataflow yet, but would be happy to chat about any ideas you have!

MattOates commented 5 months ago

Hello, I'd be interested in working on this. We have just chosen Dagster for our data pipeline orchestration, we have a lot of pre-existing and new Beam jobs to orchestrate. It's essentially our chosen distributed compute/etl with GCP. I noticed this example and very little else for how one might go about this https://discuss.dagster.io/t/16369683/can-i-use-apache-beam-multiprocessing-to-queue-google-cloud-

Given the very recent Pipes I was wondering if there is a way to perhaps do something nicer where an asset does actually block whilst the Beam job executes to avoid things like having to use sensors post launching. Any hints/tips or some design direction for something useful for the community would be very welcome.