Currently, microbatch models get a "current_time" (datetime.datetime.now(pytz.UTC)) when they are executed. Notably, each microbatch model gets a different "current_time". This works, but has some funkiness.
Consider the following:
There is a source, source_1, which is constantly being updated by an external process
Microbatch model model_a pulls from source_1
Microbatch model model_b pulls from source_1
Regardless if we're in a small or large project, given any delay in the execution of model_a and model_b, if there is any new data in source_1, the result of model_a will be different from model_b. An example would be:
source_1 has 3 rows with event times: 2024-10-07 12:17:00, 2024-10-07 12:16:00, 2024-10-07 12:15:00
Currently, microbatch models get a "current_time" (
datetime.datetime.now(pytz.UTC)
) when they are executed. Notably, each microbatch model gets a different "current_time". This works, but has some funkiness.Consider the following:
source_1
, which is constantly being updated by an external processmodel_a
pulls fromsource_1
model_b
pulls fromsource_1
Regardless if we're in a small or large project, given any delay in the execution of
model_a
andmodel_b
, if there is any new data insource_1
, the result ofmodel_a
will be different frommodel_b
. An example would be:source_1
has 3 rows with event times:2024-10-07 12:17:00
,2024-10-07 12:16:00
,2024-10-07 12:15:00
2024-10-07 12:18:00
model_a
is executed, picking up the 3 rowssource_1
:2024-10-07 12:18:30
2024-10-07 12:19:00
model_b
is executed, picking up 4 rows.Is the discrepancy okay?