Closed dbeatty10 closed 11 months ago
@dbeatty10 would something like this be suitable? My concern now is that all tests that invoke this macro will need to include logic for all target types, since they use different data type names (e.g. text
in postgres
becomes string
in bigquery
). That might be cumbersome. Do you have any feedback/ suggestions?
That PR is looking good @linbug !
Absolutely agreed that the testing part looks cumbersome. #76 ran into something similar. We discussed and tried a couple options before settling on an approach.
There are a few options for your PR:
Want to take a look at these options and let me know what you think?
You won't hear me say this too often, but I might be comfortable with skipping tests in this case if it has the least downsides of the options.
In the original issue description, it said:
this option should probably default to
true
. It possibly shouldn't even be optional, but just always be included in the generated output.
It is easy enough to just add a new include_data_types
parameter though (defaulting to true) -- I think we should do it!
For users, this would preserve optionality while still putting forward default behavior we're assuming would be best for the greatest number of users.
generate_source
already has an include_data_types
(optional, default=False) parameter, so it would align those names (even though the default would differ). Maybe we should change the default for generate_source
at the same time?
Thanks @dbeatty10 for the quick and detailed responses!
For the testing, I had a look through that previous PR that you linked. I think that adopting the same approach (using text_type_value
and integer_type_value
macros) is the simplest (and easiest to read) solution for now and would align with what has already been shipped for generate_source
. This could always be refactored in the future if we find a better approach. I don't think that skipping tests is an option without removing the existing tests, although perhaps we could adopt something similar to the tests for generate_source
, where we make a include_data_types
parameter and set it to false
in the all tests but one (perhaps generate_model_yaml
). There isn't additional value to testing this functionality in more than one test anyway.
How does that sound?
I'm aligned on adding a include_data_types
parameter and defaulting to true
. I do think that it makes sense to have both generate_source
and generate_model_yaml
take the same default, and personally I'm in favour of this being true
(as long as this doesn't break anything for existing users)! I'd be happy to update that in this or a separate PR.
Something else I just noticed is that in generate_source
, data_type
is uppercase whereas here I'm lowering. We should be consistent about this too. Any preference? According to the dbt style guide, we should default to lowercase.
@linbug Your proposals for the testing sound good -- let's do it as you proposed.
For the common include_data_types
parameter within generate_source
and generate_model_yaml
, let's do the following:
true
It's fine if the changes related to generate_source
are made within this PR or within a separate PR -- up to you.
Since there will be changes to how generate_source
behaves by default, let's make sure to update the README and also call this out in the changelog.
The current version of dbt_codegen is 0.9.0, and we'll make sure to bump the next version to 0.10.0 (or maybe even 1.0.0!) so these changes don't break anything for folks that have the 0.9.x series as an upper bound.
@dbeatty10 I've made those changes now, and signed the CLA. The PR is ready for review.
Describe the feature
When generating output, codegen should have an option to include
data_type
like the following:Furthermore, this option should probably default to
true
. It possibly shouldn't even be optional, but just always be included in the generated output.See here for more context: https://github.com/dbt-labs/internal-analytics/pull/1418#discussion_r1144991800
Describe alternatives you've considered
An alternative is to copy-paste
data_type
by hand into the YAML. These would be copied from aninformation_schema.columns
query or show columns, etc.Additional context
Side note to spin out into a separate issue: reconciling greenfield YAML from codegen with pre-existing YAML is annoying / time-consuming. Can we write some kind of guidance on easier ways to reconcile two YAML files?
Who will this benefit?
With the launch of data contracts in the dbt-core 1.5.0 release, it's handy to automatically generate the relevant
data_types
for pre-existing models.Are you interested in contributing this feature?
Happy to review the PR if someone else wants to take this on.