dbt-labs / dbt-codegen

Macros that generate dbt code
https://hub.getdbt.com/dbt-labs/codegen/latest/
Apache License 2.0
459 stars 99 forks source link

fix: only select models in graph.nodes #132

Closed gsaiz closed 5 months ago

gsaiz commented 1 year ago

resolves #

This is a:

All pull requests from community contributors should target the main branch (default).

Description & motivation

I found about this bug while using the macro generate_model_yaml. It was incorrectly setting some column descriptions as empty. After some debugging, I found out that get_model_dependencies does not always resolve the node parent correctly.

This PR fixes an edge case where you might have multiple nodes with the same name, for example in my case I had 1 analysis and 1 model that were called the exact same.

The get_model_dependencies macro assumes that there is only 1 match (always returns on the first node), so it silently in this case.

Selecting only the models both fixes my example and also makes sure it doesn't break, since model names must be uniques (it is not true across all nodes in the graph).

Checklist

esegal commented 8 months ago

I have a use cases where sources are also useful. See #153 . I believe https://github.com/dbt-labs/dbt-codegen/pull/154 will also fix this issue.

gwenwindflower commented 6 months ago

hey @gsaiz sorry for the extremely long wait in getting this addressed, i've been doing some work to get this repo caught up. if you're still interested in contributing this i'd love to help you get it across the finish line! it looks great but as @esegal mentions, this logic has been improved when we merged his PR in yesterday. i think you may still need to filter the graph.nodes to only select from models in addition to sources in the logic he updated though, so if you wanted to take a look at the current code and re-apply your updates that would be awesome. if things have changed in your capacity in the months this has been waiting and you don't want to contribute this anymore, no worries at all! i can try and push some updates to address this myself, but i'd prefer to help you get this in yourself if you still want to 🙏🏻 .

gsaiz commented 5 months ago

Hi @gwenwindflower , thanks for putting in the work to catch up with the PRs.

I have had a look at @esegal 's changes, and it does seem like this PR is no longer needed, because now when you do this:

{% for node in nodes.values()
        | selectattr('resource_type', 'equalto', resource_type)
        | selectattr('name', 'equalto', model_name) %}
        {% for col_name, col_values in node.columns.items() %}
            {% do dict_with_descriptions.update( {col_name: col_values.description} ) %}
        {% endfor %}
    {% endfor %}

if you pass resource_type = 'model', then it is already filtering only the models.

Happy to close the PR since it is no longer needed.

gwenwindflower commented 5 months ago

@gsaiz ah, i see now, i looked more closely at the chain of calls and i see how model gets passed implicitly -- got it, thanks for clarifying. yep, let's go ahead and close this then. thank you! 💜