dbt-labs / dbt-codegen

Macros that generate dbt code
https://hub.getdbt.com/dbt-labs/codegen/latest/
Apache License 2.0
464 stars 102 forks source link

get table & column descriptions from information_schema metadata columns when generating yml #119

Closed jakub-auger closed 8 months ago

jakub-auger commented 1 year ago

Describe the feature

As a dev I want to leverage the metadata that already exists in my database, namely table and column descriptions which are captured in the information_schema.tables and .columns metadata tables

Describe alternatives you've considered

manually scripting out the info on a per table

Additional context

Current use case is on databricks w/unity catalog, but would be useful elsewhere

Who will this benefit?

The source tables are very wide (100s of columns), manually data entering descriptions into a yml file in notepad isn't feasible

More robust to enter the info via the databricks UI into the table and columns. Need to pull it out to expose this info in the dbt docs

Are you interested in contributing this feature?

kellybh123 commented 1 year ago

I have similar ask. I have a bunch of source tables in bigquery that have descriptions already in them and I would like to port those over to the dbt model ymls as well. But when i create the a base/source model yml using the generate_model_yaml function there is no way to tell the function hey can you look for descriptions from source table and if present automatically put those in those first models ymls. I understand that once the descriptions are in the first model i can use the upstream_descriptions in all downstream models, but i have 1000+ columns i am not trying to copy paste over, which already have somewhat of adequate descriptions.

ShameGod commented 1 year ago

@kellybh123 I am encountring the same issue. Note that in dbt-bigquery, the column class that is used by codegen to retrieve columns from big query, does not have a description attribute.

aaronsteers commented 9 months ago

I see we have some upvotes here and I'm interested also.

Any objection from maintainers about adding descriptions to columns and tables if those can be discovered from the table/column metadata?

Aka - if capacity opens up, would a contribution here be accepted? 😄

dbeatty10 commented 8 months ago

Thank you for opening this @jakub-auger for opening this, and for all of you that have shown interest in it!

It's not a priority for us to add this to dbt-codegen at this time, so we won't be accepting contributions.

Alternative approaches are sketched out here: https://github.com/dbt-labs/dbt-core/issues/9198#issuecomment-1877377688

TLDR

Upon the next release of dbt-osmosis, you should be able to do something like this:

dbt docs generate
dbt-osmosis yaml document --catalog-file target/catalog.json

Or you (or a different 3rd party tool) can utilize programmatic invocations to generate a Catalog artifact and use it to scaffold your YAML files with comments included.