dbt-labs / dbt-codegen

Macros that generate dbt code
https://hub.getdbt.com/dbt-labs/codegen/latest/
Apache License 2.0
464 stars 102 forks source link

Add optional arguments to include database and schema properties in `sources.yml` generated from `generate_source` #123

Closed jeremyholtzman closed 1 year ago

jeremyholtzman commented 1 year ago

Describe the feature

The generate_source macro should always include the properties for database and schema to explicitly call out where the source data comes from. This should be a best practice especially for those who are new to dbt and the source.yml files.

Describe alternatives you've considered

The alternative is the current method - where database and schema are only added if they are different from the target database and name argument respectively.

Additional context

Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here. This should work across databases just as generate_source works currently.

Who will this benefit?

What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.

Are you interested in contributing this feature?

Yes

dbeatty10 commented 1 year ago

@jeremyholtzman thanks for opening this feature suggestion!

The docs for sources notes that:

*By default, schema will be the same as name. Add schema only if you want to use a source name that differs from the existing schema.

If we update the behavior of codegen, we'll want to re-consider the language in those docs at the same time.

Would like to get affirmation from one of the maintainers of dbt-project-evaluator of the best practices for the schema property.

@grace do you know who could weigh in on automatically adding schema to codegen?

jeremyholtzman commented 1 year ago

I did post an internal Slack poll in our training team channel and so far to see in general Do you think it’s best practice to always define both the database and schema properties for all sources configured in the sources.yml file?

So far, all 5 others (luckily enough including Grace) have seemed to agree that it is best practice, but I'll definitely let Grace weigh in here directly. If we do make this change, I'd definitely love to change the docs for sources like you mentioned

jeremyholtzman commented 1 year ago

@dbeatty10 I think you meant to tag in @graciegoheen for your question above

dbeatty10 commented 1 year ago

Thanks @jeremyholtzman -- Indeed, I did intend to tag @graciegoheen.

The results of the internal Slack poll sound good to me -- let's proceed with reviewing the implementation you started in https://github.com/dbt-labs/dbt-codegen/pull/124

graciegoheen commented 1 year ago

I definitely like to do this when teaching/training new dbt users because it demonstrates all of the levers they have available for defining source definitions. That being said, it's not DRY and would lead to longer yml files - so I welcome a debate as to whether or not we would consider this "best practice."

Another enhancement here could be adjusting the words "schema" and "database" depending on the warehouse you're in - for example, in BigQuery use dataset (instead of schema) and project (instead of database).

jeremyholtzman commented 1 year ago

I love the idea of adjusting the words schema and database depending on the warehouse!

Some points from the Slack thread that argue for adding the database and schema fields:

jeremyholtzman commented 1 year ago

@dbeatty10 Below are the updated results from the poll for this feature:

I'm not sure whether you think we should or shouldn't proceed with the task

graciegoheen commented 1 year ago

Thoughts on just adding an optional argument to determine the behavior of this macro?

jeremyholtzman commented 1 year ago

Yeah that might be the easiest solution, I'm good with that!

dbeatty10 commented 1 year ago

Thanks for soliciting all that feedback @jeremyholtzman ! 🏆

Agreed, let's go with @graciegoheen's suggestion of making this optional to preserve the current behavior, but also allow for richer configurability.

Specifically, let's add two new optional parameters to generate_source (both defaulting to False):

When either or both of these are True, then they will explicitly include database and/or schema within the output YAML.

Further discussion of implementation details can take place in the associated pull request: https://github.com/dbt-labs/dbt-codegen/pull/124.