dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.71k stars 1.61k forks source link

[Feature] Lineage of sources from models in multi-project docs #9618

Closed akerone closed 7 months ago

akerone commented 7 months ago

Is this your first time submitting a feature request?

Describe the feature

When having multiple projects and one documentation repository, many of those projects may use tables from others as sources. Right now, it is not possible to track down this in the dbt docs Lineage Graph.

We'd like to see the full lineage of a table across multiple projects.

Describe alternatives you've considered

The alternative is going to the "Database" tab, look for the source, find the repeated name, and click on the right one to find the model.

Who will this benefit?

This will benefit anyone who uses dbt docs generate with multiple projects in the packages.yml

Are you interested in contributing this feature?

Yes, check the code below.

Anything else?

The following python script creates a new version of the manifest.json that solves this issue in dbt 1.7.8

This is working for me as an extra step, but I think that it sould be integrated as a --option of the dbt docs generate command.

import json

with open("target/manifest.json", "r") as manifest_file:
    manifest = json.load(manifest_file)
    model_relations = {}
    for k, v in manifest['nodes'].items():
        if v['resource_type'] == "model":
            model_relations[f"{v['database']}.{v['schema']}.{v['name']}"] = k
    for k, v in manifest['sources'].items():
        relation_name = f"{v['database']}.{v['schema']}.{v['name']}"
        if relation_name in model_relations:
            manifest['parent_map'][k].append(model_relations[relation_name])
            manifest['child_map'][model_relations[relation_name]].append(k)

with open('manifest.json', 'w', encoding='utf-8') as f:
    json.dump(manifest, f, ensure_ascii=False, indent=2)
dbeatty10 commented 7 months ago

Thanks for proposing this idea @akerone ! 🧠

We don't view this as the preview of dbt docs generate. Instead, you can do any extra post-processing steps like in your example.

Nic3Guy commented 7 months ago

@akerone Thx for the code, I used it to create dbt docs with multiple projects with amazing success ;)

akerone commented 7 months ago

Hi @Nic3Guy ! I'm glad you could find it. I hope the title is descriptive enough to find for all of us that use "Option 4: Separate Team Repositories + One Documentation Repository".

Since it seems it won't be integrated into dbt, we'll have to keep close tabs on the release notes of each dbt release. Any changes to the structure of manifest.json may break this script.