dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.35k stars 1.44k forks source link

Provide the ability to disable the `AutoMaterializePolicy` for individual assets #15504

Open danielgafni opened 1 year ago

danielgafni commented 1 year ago

What's the use case?

Sometimes (like when working on fixing a broken asset) one may need to temporary hide the asset from the AutoMaterializeDaemon. It would be nice if this could be done from the UI.

Would it be a reasonable feature? Or should this only be controlled from code?

Ideas of implementation

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

What we've heard:

sryza commented 1 year ago

@danielgafni I do think this would be a reasonable feature

sryza commented 1 year ago

@mmutso-boku @danielgafni @pablo-statsig would you be up for add a little more detail about situations where you'd envision using this feature, and how you'd envision using it?

Additionally, one of the things that we're thinking about is the UI- and general user-facing complexity of exposing this at the granularity of individual assets. We're evaluating the possibility of instead exposing it at the granularity of - say - groups. Curious about your thoughts on this.

danielgafni commented 1 year ago

Hey Sandy

So i usually need something like that when one of the assets breaks and has to be fixed.

Right now I'm stopping the daemon globally, rush to deploy a fix, and turn the daemon back on.

I'm getting away with this approach since our asset graph isn't that big (yet) and most of the assets don't have to be super fresh, but I imagine it still can cause issues (for example, someone else might be waiting for some other assets to update, or forgetting to turn the daemon back on).

In general it feels wrong to step the entire orchestration only to fix one small part of the system.

sryza commented 11 months ago

More feedback collected from Slack - https://dagster.slack.com/archives/C01U5LFUZJS/p1698367625371559?thread_ts=1698346105.629629&cid=C01U5LFUZJS:

Chris Comeau:

I can see some scenarios with auto-materialization where it would be useful to be able to "pause" auto-materialization of specific assets (or groups). Some examples: suppressing errors during a known outage window or for manual intervention on a table. For now can do something similar just by toggling auto-materialize off globally, but that's suspending other unaffected assets.

Me:

Question for you: when you say "specific assets (or groups)", would one of these more be more valuable for you? Do you envision situations where you'd want to turn off an asset but not the other assets in the same group?

Chris

Between the two I think I'd prefer the asset-specific one... covers a wider range of scenarios, maybe with more clicking if a whole group was affected but not a big deal. Thanks (edited)

mmutso-boku commented 10 months ago

@sryza Sorry for the late reply.

One of the use cases would be, as already mentioned, when an asset breaks, or a bug is discovered. In such case would be nice to "cut" the dependency chain at the affected asset, so that downstream would not be materialized.

Another is when adding new assets, and backfilling historic partitions. Due to various reasons, it is not always possible to backfill multiple assets in one go, so need to do it one asset at a time. Also quite often need to just have historic partitions "marked as materialized" (using a custom job), so kicking off an actual run is not desired in that case. If auto-materialize is configured for the downstream assets (as would want to use it once assets have been "initialized"), then it kicks off runs, when I would not yet want it to happen. The current workaround for this has been to deploy the new assets without AutoMaterializePolicy, backfill them/get them to the normal running state, and then deploy again with AutoMaterializePolicies.

We're evaluating the possibility of instead exposing it at the granularity of - say - groups.

I guess this could be made to work. The groups at the moment are quite large, so would need to refactor them to be more specific and smaller, so that it would be possible to disable only the affected part of the whole chain.

sryza commented 10 months ago

Thanks for the input @mmutso-boku - that makes a lot of sense.

chrishiste commented 7 months ago

A perfect candidate for this seems to be that UI toggle we can see on the asset group. Instead of having a global toggle. I would love to have a toggle for each asset like we can do for schedules.

ScreenShot 2024-03-08 at 18 27 51@2x

ScreenShot 2024-03-08 at 18 29 08@2x

CSRessel commented 5 months ago

I also think this would be a valuable feature. There isn't a super effective solution to these scenarios where someone has to either pause the auto-materialization daemon, or change the auto_materialize_policy on the asset. The backfill examples above, an asset that's producing bad data, or an asset with side effects requiring intervention, all of these things require redeploy of the code location in order to pause the asset from running. I would also appreciate the ability to individually disable it on the asset page

Daniel-Vetter-Coverwhale commented 3 months ago

RFC for new auto-materialize stuff for anyone who subscribed here but maybe hasn't seen it yet - https://github.com/dagster-io/dagster/discussions/22811