elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.8k stars 154 forks source link

lineage graph not displaying model versions #973

Open sambloom92 opened 1 year ago

sambloom92 commented 1 year ago

Describe the bug dbt 1.5 introduced model versioning, however the elementary observability report does not display model versions in the lineage graph - each version appears as a separate node, but with the exact same name, so it does not accurately reflect the actual DAG. This diverges from dbt-core's lineage graph which does display the versions correctly.

To Reproduce Steps to reproduce the behavior:

  1. Create 2 or more versions of a model in a project with elementary set up.
  2. Run dbt build
  3. Run edr report
  4. Click on the 'Lineage' tab

Expected behavior The lineage graph should display the model name with its version, as it is displayed in dbt core's docs.

Screenshots Below is an example of the dbt-core docs lineage graph, and the lineage graph in the elementary observability report:

Screenshot 2023-06-30 at 12 47 42 Screenshot 2023-06-30 at 12 48 56

Environment

Maayan-s commented 1 year ago

Thanks for reporting this @sambloom92 ! Just to get the context:

  1. Why would you keep transaction_features_v1?
  2. Should we follow dbt and display both, just with the version, or only display the new one?
  3. When you query the model, the name you use is transaction_features_v2 or transaction_features? (to understand if this is how it's really called in the database)
sambloom92 commented 1 year ago

Hi @Maayan-s,

For context this page from dbt gives a really good overview of the new versioning features, but to answer your questions:

  1. Sometimes there are downstream dependencies on older model versions, or there are conflicting requirements, so keeping multiple versions allows downstream users to migrate to using the newer version at their own pace, or keep using the older version indefinitely if the new version does not meet their requirements. Essentially it provides a way of managing releases for breaking changes.
  2. I think aligning with dbt is the best approach, so display all versions. A good enhancement would be to provide some highlighting or filtering so you can easily see what the latest version is, but probably wait to see if dbt have plans to do that first.
  3. There are a few different ways to reference a versioned model (see here), but in the actual database the table/view name may or may not include the version suffix - it completely depends on how it is configured (see this section), so you can't rely on the physical table/view having the version suffix in its name.