dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.96k stars 1.63k forks source link

Set a singular "current_time" for microbatch models per invocation #10819

Closed QMalcolm closed 1 month ago

QMalcolm commented 1 month ago

Currently, microbatch models get a "current_time" (datetime.datetime.now(pytz.UTC)) when they are executed. Notably, each microbatch model gets a different "current_time". This works, but has some funkiness.

Consider the following:

  1. There is a source, source_1, which is constantly being updated by an external process
  2. Microbatch model model_a pulls from source_1
  3. Microbatch model model_b pulls from source_1

Regardless if we're in a small or large project, given any delay in the execution of model_a and model_b, if there is any new data in source_1, the result of model_a will be different from model_b. An example would be:

  1. source_1 has 3 rows with event times: 2024-10-07 12:17:00, 2024-10-07 12:16:00, 2024-10-07 12:15:00
  2. the "current_time" is 2024-10-07 12:18:00
  3. model_a is executed, picking up the 3 rows
  4. new data arrives to source_1: 2024-10-07 12:18:30
  5. the "current_time" is 2024-10-07 12:19:00
  6. model_b is executed, picking up 4 rows.

Is the discrepancy okay?