Open gastlich opened 5 months ago
+1 upvote
This would definitely help keeping the YAML DRY. We have few large models in our setup and some of them share same columns. The growing size of YAML with repeating columns has become a maintenance concern. Changing the list to dict seems simple yet a smart fix to be able to use YAML merge feature.
Is this your first time submitting a feature request?
Describe the feature
Currently columns within
models:
have to be defined as sequence, as follows::My proposal is to allow to columns to be defined as mapping in addition to the already supported sequence:
This feature request is driven by a few factors.
Flexibility: The mapping format allows for more flexibility in defining columns. We can use native YAML's features for mapping, like merge https://yaml.org/type/merge.html . But, on the other hand, it doesn't force us to use it. We can still define columns as a simple sequence.
Readability: Thanks to implementing DRY principle, the mapping format is more readable than the sequence format and not as over-bloated. You don't have to repeat columns multiple times. In our case, we produce two types of marts, the latest "state" and the "history". The "history" mart has the same columns as the "state" mart, but with some additional columns. The mapping format would allow us to define the common columns once and then add the additional columns for the "history" mart.
Overall, I believe that allowing columns to be defined as a mapping in addition to a sequence would make the DBT's YAML files easier to read and maintain.
I am not aware of any internal design decisions within DBT that would make it impossible to implement this feature. The change itself should be relatively simple to implement, by checking the data type of the
columns
key and then processing it accordingly in a generator, that yields sequence items.Describe alternatives you've considered
YAML Limitations
As we know, YAML doesn't support flattening merged sequences, making it unsuitable for defining
columns
. (Reference: YAML Issue #35)Additionally, YAMLScript is still in its early stages of development, so it may not be suitable for immediate use. (Reference: YAML Issue #48)
DBT's Built-in Feature
I believe DBT should avoid implementing too many YAML-specific features to prevent reinventing the wheel. Outsourcing more features allows DBT to focus on data transformation.
Custom Solution
The same reasoning applies here.
Who will this benefit?
This feature will benefit all DBT users who deal with large models, that are exposed in multiple flavours, like in our case the
state
andhistory
models. It will also benefit users who want to define columns in a more flexible way, allowing them to use YAML's native features like merge.Are you interested in contributing this feature?
Yes
Anything else?
No response