Open ssillaots-boku opened 4 weeks ago
cc @OwenKephart
Hi @ssillaots-boku ! This is an interesting edge case -- in essence, if the child updates on the same tick that a new parent update comes in, then we treat the parent update as "handled by" the new child update (because the child updated "at the same time as" the parent), even if those executions were in different runs.
The eager policy waits for all in progress work to complete before kicking off a run, so by the time its run completes, everything appears to be ok from the perspective of the system.
Another way of looking at is that the frequency of evaluations determines the "resolution" of how accurately we can determine "event_a came before event_b". Typically, the expectation is that the frequency of materializations of a single asset will be much less than once every 30 seconds and so the cases in which this would occur would be fairly rare.
Removing the since_last_handled()
condition will work for this test case, but will yield incorrect results in "real-world" usage. In particular, imagine you have A -> B -> C, and you manually materialize A. B will be requested, and because C can see that B is getting requested this tick, it will also get requested (any_deps_updated
checks for newly_updated | will_be_requested).
So you have a run that will materialize both of them, and that starts executing. This run will materialize B, which then will be parsed by the system and result in a second run of C, which is not the desired behavior.
This test should pass if you put an extra tick (that emits 0 runs) in between the upstream materializations.
If this test case represents a real-world situation you ran into (and isn't just an artifact of time scales of unit tests being much shorter than real life), then you could using AutomationCondition.any_deps_match(AutomationCondition.newly_updated())
This will result in your asset being requested as soon as any parent update is detected, and will ignore any in progress runs etc. However, this also means that runs will not be chained together (i.e. if you have A -> B -> C, and update A, then the system will create a run for just B in response, and then once that completes, it will create a separate run for C).
Dagster version
1.8.8
What's the issue?
I have a feeling that
AutomationCondition.any_deps_updated().since_last_handled()
doesn't work as expected. There is a chance that I'm just misunderstanding the definition and the flow/action is intended.This example is based on default
AutomationCondition.eager()
. The scenario is that there are 2 upstream assets and 1 downstream asset (downstream asset depends on both upstream assets). For the sake of simplicity the 2 upstream assets are manually materialized and the downstream asset hasAutomationCondition.eager()
. All assets are daily partitioned. I'll include sensor ticks into the steps.For the 7th step I anticipated that downstream asset is triggered but it isn't.
What did you expect to happen?
I then created a custom
AutomationCondition
which is almost identical toeager()
only difference being I removed.since_last_handled()
With this AutomationCondition it worked as expected.
How to reproduce?
I created a test. Hopefully that is enough.
Deployment type
Dagster Helm chart
Deployment details
No response
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a š! We factor engagement into prioritization.