dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.26k stars 1.53k forks source link

[CT-2699] [Feature] "interactive" compile should include the compiled code of a snapshot #7867

Open nathangriffiths-cdx opened 1 year ago

nathangriffiths-cdx commented 1 year ago

Is this a new bug in dbt-core?

Current Behavior

With dbt 1.5 the compile command was extended to allow "interactive" compilation of an arbitrary node by using syntax similar to : dbt compile --select name_of_node

In our testing we found this seems to work for all types of node, including snapshots.

However, the vanilla operation of dbt compile does not support snapshots. The documentation states "dbt compile generates executable SQL from source model, test, and analysis files." i.e. it does not include snapshots and indeed if dbt compile is run for a project no code files are created for snapshots in "/target/compiled/..".

This has created an inconsistency where one usage of compile will generate output for a snapshot but another will not.

This is not a bug as such but I think the inconsistent behaviour is potentially confusing for users, and it's not clear why standard compile doesn't already work for snapshots since it is apparently possible based on the interactive version.

Expected Behavior

All usages of dbt compile should produce compiled code for snapshots i.e. non-interactive uses of compile should generate the same compiled code as interactive use.

Steps To Reproduce

  1. Create a snapshot file
  2. Run dbt compile
  3. Note no output produced under "/target/compiled/"
  4. Run dbt compile --select <name_of_snapshot>
  5. Note compiled code output to command line

Relevant log output

No response

Environment

- OS: Windows
- Python: 3.10.10
- dbt: 1.5.0

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

dbeatty10 commented 1 year ago

Thanks for reaching out @nathangriffiths-cdx !

Is your request basically that the SQL for snapshots would be written to target/compiled/your_project/snapshots/ when you run dbt compile?

nathangriffiths-cdx commented 1 year ago

Thanks for reaching out @nathangriffiths-cdx !

Is your request basically that the SQL for snapshots would be written to target/compiled/your_project/snapshots/ when you run dbt compile?

Yes, that's correct. I was actually unaware this didn't already happen until some engineers we were training on dbt tried to find the compiled SQL for snapshots and asked me where it was. This is inconsistent with other dbt models and a bit confusing for new users.

jtcohen6 commented 1 year ago

It looks like we explicitly exclude snapshots & seeds:

https://github.com/dbt-labs/dbt-core/blob/9836f7bdef00b7cb912ad88ab1bbbfbd4dc4b312/core/dbt/compilation.py#L552-L564

This makes sense for seeds, since they aren't actually "compiled." For snapshots, it looks like this logic goes way back, all the way to when they used to be called archives and defined as config-only in dbt_project.yml. I'm guessing we didn't used to write their compiled code because, when defined in dbt_project.yml, where would you write it to?

That's still a problem today, because multiple snapshots can share the same compiled_path, since they can be defined in the same file:

https://github.com/dbt-labs/dbt-core/blob/9836f7bdef00b7cb912ad88ab1bbbfbd4dc4b312/core/dbt/parser/snapshots.py#L24-L26

This feels like one more paper cut associated with snapshots being the only runnable node type that's defined in a Jinja block, and can have multiple per file, rather than just one node definition per file. I'm sure we could find some reasonable way to make this work, such as by making get_compiled_path include the (unique) name of the snapshot.

I'm going to label this one help_wanted. The change itself is relative straightforward:

In the meantime: While I understand it's inconsistent, it does feel like an improvement that "interactive" compile will include the compiled code of a snapshot.