datnguye / dbterd

Generate the ERD as a code from dbt artifacts
https://dbterd.datnguyen.de/
MIT License
208 stars 30 forks source link

Reverse relationships and ghost columns #37

Closed danny-chain closed 1 year ago

danny-chain commented 1 year ago

Describe the bug Say a relationship is defined in schema.yml on model X referencing model Y. However, sometimes in the output.dbml, the relationship is defined as Y => X, rather than X => Y. Furthermore, if the foreign_key column name on X is a different name from the primary_key column name on Y, then those are swapped as well. This also causes "ghost columns" to appear in the involved tables. For example, if the foreign_key column on model X is 'F', and the primary_key on model Y is 'P', then the relationship will sometimes be defined as: X[P] => Y[F]. This creates a ghost column 'F' on table Y, and a ghost column 'P' on table X.

To Reproduce Steps to reproduce the behavior: This behavior is random. See "additional context" for discovery

Expected behavior The relationship in output.dbml reflects the correct column names and direction of the relation.

Desktop (please complete the following information):

Additional context I dug into the dbterd source code a bit and believe the problem is related to the line of code table_map=manifest.parent_map[x] in the test_relationship.py. It appears that this code expects the first element in the pair of the parent_map to be the foreign model, and the second element in the pair is the "this" model. However, looking in my manifest.json, it looks like DBT makes no such guarantees; as far as its concerned it is an unordered pair, and the order is likely some result of the order of each model in the DAG that it was compiled.

So, I think there needs to be some small logic that determines if the pair at manifest.parent_map[x] needs to be swapped before it is assigned to table_map.

danny-chain commented 1 year ago

I did a test, and was able to solve the problem by creating a function

    def get_table_map(node):
        refs_arr = node.depends_on.nodes 
        foreign_model = node.test_metadata.kwargs['to']
        if f"'{refs_arr[1].split('.')[-1]}'" in foreign_model:
            refs_arr = [refs_arr[1],refs_arr[0]]
        return refs_arr

then replacing the call to parent_map with:

table_map=get_table_map(manifest.nodes[x]),

You'll note that I was able to find the same pair on node.depends_on.nodes instead of going through parent_map, and that pair does seem to be in reliable order, but kept the swap logic in the function because a) even though I'm seeing it always in order, it can't hurt to ensure in case its still not guaranteed, and b) if there's a reason you're referencing parent_map instead of depends_on (maybe depends_on.nodes isn't always present I'm not sure), you can add the parent_map[x] as another function parameter and it still works.

There's probably opportunity to make that function cleaner, but from a logic standpoint, it does appear to solve the bug.

datnguye commented 1 year ago

Thanks @danny-chain for raising the issue and its solution as well. I will work on it this weekend, but feel free to create PR while awaiting. Cheers

datnguye commented 1 year ago

Should be fixed in 1.2.3