Closed jeremyholtzman closed 1 year ago
@jeremyholtzman thanks for opening this feature suggestion!
The docs for sources notes that:
*By default, schema will be the same as name. Add schema only if you want to use a source name that differs from the existing schema.
If we update the behavior of codegen, we'll want to re-consider the language in those docs at the same time.
Would like to get affirmation from one of the maintainers of dbt-project-evaluator of the best practices for the schema
property.
@grace do you know who could weigh in on automatically adding schema
to codegen?
I did post an internal Slack poll in our training team channel and so far to see in general Do you think it’s best practice to always define both the database and schema properties for all sources configured in the sources.yml file?
So far, all 5 others (luckily enough including Grace) have seemed to agree that it is best practice, but I'll definitely let Grace weigh in here directly. If we do make this change, I'd definitely love to change the docs for sources like you mentioned
@dbeatty10 I think you meant to tag in @graciegoheen for your question above
Thanks @jeremyholtzman -- Indeed, I did intend to tag @graciegoheen.
The results of the internal Slack poll sound good to me -- let's proceed with reviewing the implementation you started in https://github.com/dbt-labs/dbt-codegen/pull/124
I definitely like to do this when teaching/training new dbt users because it demonstrates all of the levers they have available for defining source definitions. That being said, it's not DRY and would lead to longer yml files - so I welcome a debate as to whether or not we would consider this "best practice."
Another enhancement here could be adjusting the words "schema" and "database" depending on the warehouse you're in - for example, in BigQuery use dataset (instead of schema) and project (instead of database).
I love the idea of adjusting the words schema
and database
depending on the warehouse!
Some points from the Slack thread that argue for adding the database and schema fields:
@dbeatty10 Below are the updated results from the poll for this feature:
I'm not sure whether you think we should or shouldn't proceed with the task
Thoughts on just adding an optional argument to determine the behavior of this macro?
Yeah that might be the easiest solution, I'm good with that!
Thanks for soliciting all that feedback @jeremyholtzman ! 🏆
Agreed, let's go with @graciegoheen's suggestion of making this optional to preserve the current behavior, but also allow for richer configurability.
Specifically, let's add two new optional parameters to generate_source
(both defaulting to False
):
include_database
include_schema
When either or both of these are True
, then they will explicitly include database
and/or schema
within the output YAML.
Further discussion of implementation details can take place in the associated pull request: https://github.com/dbt-labs/dbt-codegen/pull/124.
Describe the feature
The
generate_source
macro should always include the properties fordatabase
andschema
to explicitly call out where the source data comes from. This should be a best practice especially for those who are new to dbt and the source.yml files.Describe alternatives you've considered
The alternative is the current method - where
database
andschema
are only added if they are different from the target database and name argument respectively.Additional context
Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here. This should work across databases just as
generate_source
works currently.Who will this benefit?
What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.
Are you interested in contributing this feature?
Yes