dbt-labs / dbt-docs

Auto-generated data documentation site for dbt projects
Apache License 2.0
141 stars 75 forks source link

[Feature] Include filtered dimensions on metrics within the lineage graph #518

Open mattiasthalen opened 3 months ago

mattiasthalen commented 3 months ago

Is this a new bug in dbt-core?

Current Behavior

When defining a metric by filtering on dimensions from another semantic model, only the primary semantic model is connected in the DAG.

Expected Behavior

All semantic models used in the metric definition are connected as upstream nodes

Steps To Reproduce

  1. Define metric
  2. Filter on condition from another semantic model
  3. Generate docs
  4. Serve docs

Relevant log output

No response

Environment

- OS: mcr.microsoft.com/devcontainers/python:3.12-bullseye
- Python: 3.12.3
- dbt: 1.8.3

Which database adapter are you using with dbt?

No response

Additional Context

In this case Public Health Centers is filtering on values from dimension_units via fact_contacts > link_contacts_units. image

dbeatty10 commented 3 months ago

Thanks for reaching out @mattiasthalen !

Could you share example YAML code for these two steps to reproduce?

  1. Define metric
  2. Filter on condition from another semantic model
mattiasthalen commented 3 months ago

@dbeatty10, absolutely!

version: 2

semantic_models:
  - name: dimension_health_centers
    label: "Health Centers"
    description: "A semantic model for health centers"
    model: ref('dimension__units')

    entities:
      - name: health_center_snapshot_hk
        expr: unit_snapshot_hk
        type: primary

    dimensions:
      - name: health_center_management
        expr: unit__management
        label: "Health Center Management"
        type: categorical

  - name: fact_contacts
    label: "Contacts"
    description: "A semantic model for contacts"
    model: ref('fact__contacts')
    defaults:
      agg_time_dimension: measurement_year_month

    entities:
      - name: fact_record_snapshot_hk
        type: primary

      - name: health_center_snapshot_hk
        type: foreign

    dimensions:        
      - name: measurement_year_month
        label: "Measurement Year Month"
        expr: (measurement_year_month||'-01')::date
        type: time
        type_params:
          time_granularity: month

    measures:  
      - name: health_centers
        expr: health_center_snapshot_hk
        description: Number of health centers
        agg: count_distinct

metrics:
  - name: health_centers
    label: Health Centers
    description: Number of health centers
    type: simple
    type_params:
      measure:
        name: health_centers
        fill_nulls_with: 0
        join_to_timespine: true

  - name: health_centers_public
    label: Public Health Centers
    description: Number of public health centers
    type: derived
    type_params:
      expr: health_centers
      metrics:
        - name: health_centers
          filter: |
            coalesce({{ Dimension('health_center_snapshot_hk__health_center_management') }}, 'private') != 'private'
image
dbeatty10 commented 3 months ago

After connecting with @Jstein77 and @plypaul we've determined that this is a known limitation, but it's something that we'd like to make possible. I'm going to update this to be an accepted feature request accordingly.

More detail

The reason for the current limitation is because we resolve the filter at run time rather than parse time, so the dependencies don't get added to the manifest, so they aren't displayed in the lineage. We believe we can do something similar to the dependency resolution for saved queries in order to implement this.