fix: only select models in graph.nodes

gsaiz commented 1 year ago

resolves #

This is a:

[ ] documentation update
[x] bug fix with no breaking changes
[ ] new functionality
[ ] a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

I found about this bug while using the macro generate_model_yaml. It was incorrectly setting some column descriptions as empty. After some debugging, I found out that get_model_dependencies does not always resolve the node parent correctly.

This PR fixes an edge case where you might have multiple nodes with the same name, for example in my case I had 1 analysis and 1 model that were called the exact same.

The get_model_dependencies macro assumes that there is only 1 match (always returns on the first node), so it silently in this case.

Selecting only the models both fixes my example and also makes sure it doesn't break, since model names must be uniques (it is not true across all nodes in the graph).

Checklist

[ ] This code is associated with an issue which has been triaged and accepted for development.
[x] I have verified that these changes work locally
[x] I have updated the README.md (if applicable)
[ ] I have added tests & descriptions to my models (and macros if applicable)
[x] I have added an entry to CHANGELOG.md

esegal commented 8 months ago

I have a use cases where sources are also useful. See #153 . I believe https://github.com/dbt-labs/dbt-codegen/pull/154 will also fix this issue.

gwenwindflower commented 6 months ago

hey @gsaiz sorry for the extremely long wait in getting this addressed, i've been doing some work to get this repo caught up. if you're still interested in contributing this i'd love to help you get it across the finish line! it looks great but as @esegal mentions, this logic has been improved when we merged his PR in yesterday. i think you may still need to filter the graph.nodes to only select from models in addition to sources in the logic he updated though, so if you wanted to take a look at the current code and re-apply your updates that would be awesome. if things have changed in your capacity in the months this has been waiting and you don't want to contribute this anymore, no worries at all! i can try and push some updates to address this myself, but i'd prefer to help you get this in yourself if you still want to 🙏🏻 .

gsaiz commented 5 months ago

Hi @gwenwindflower , thanks for putting in the work to catch up with the PRs.

I have had a look at @esegal 's changes, and it does seem like this PR is no longer needed, because now when you do this:

{% for node in nodes.values()
        | selectattr('resource_type', 'equalto', resource_type)
        | selectattr('name', 'equalto', model_name) %}
        {% for col_name, col_values in node.columns.items() %}
            {% do dict_with_descriptions.update( {col_name: col_values.description} ) %}
        {% endfor %}
    {% endfor %}

if you pass resource_type = 'model', then it is already filtering only the models.

Happy to close the PR since it is no longer needed.

gwenwindflower commented 5 months ago

@gsaiz ah, i see now, i looked more closely at the chain of calls and i see how model gets passed implicitly -- got it, thanks for clarifying. yep, let's go ahead and close this then. thank you! 💜

dbt-labs / dbt-codegen

fix: only select models in graph.nodes #132

Description & motivation

Checklist