Closed yauhen-sobaleu closed 1 year ago
Hey @yauhen-sobaleu, as discussed on the forum I'm very interested in this! I think it would be very similar to the width_bucket macro we already have, which has a native Snowflake version and for other warehouses we emulate it with other primitives from SQL.
I'll mark this as help wanted, it's a pretty beefy contribution but if you or someone else wants to have a go at building it, I'm happy to provide pointers.
I think the macro signature and implementation would look something like this. I haven't tested this at all, just wrote it on a plane so it will need some pretty serious finessing from here, as well as adding tests.
{% macro grouping_sets(group_sets, data_source, aggregate_expression) %}
{{ adapter.dispatch('grouping_sets', 'dbt_utils')(group_sets, data_source, aggregate_expression) }}
{% endmacro %}
{% macro default__grouping_sets(group_sets, data_source, aggregate_expression) %}
{% set all_list_elements = [] %}
{% for item in group_sets %}
{% do group_sets.append(item.values() %)
{% endfor %}
{% set unique_grouping_columns = set_strict(group_sets) %}
select {{ unique_grouping_columns | join(", ") }}, {{ aggregate_expression }}
from {{ data_source }}
group by grouping sets (
{% for grp in group_sets %}
( {{grp | join(", ") }} )
{% if not loop.last %}
,
{% endif %}
{% endfor %}
)
{% endmacro %}
{% macro redshift__grouping_sets(group_sets, data_source, aggregate_expression) %}
{% set all_list_elements = [] %}
{% for item in group_sets %}
{% do group_sets.append(item.values() %)
{% endfor %}
{% set unique_grouping_columns = set_strict(group_sets) %}
{% for group_set in group_sets %}
select
{% for column in unique_grouping_columns %}
{% if column in group_set %}
{{ column }}
{% else %}
--TODO: this won't work if it's the first usage of the column, would need to be cast as the right data type.
--union_relations does this well but means these have to be Relations instead of just a CTE.
--Alternative is to require a dictionary of data types to be passed in, but that's janky
null as {{ column }}
{% end %}
,
{% endfor %}
{{ aggregate_expression }}
from {{ data_source }}
group by {{ group_set.values() | join(", ") }}
{% if not loop.last %}
union all
{% endif %}
{% endfor %}
{% endmacro %}
Then it will be called as:
{{ grouping_sets([['county'], ['city'], ['county', 'city']]), 'sale_view', 'sum(margin) as total' }}
which will render to something that looks like one of these, depending on the warehouse:
select county, city, sum(margin) as total
from sale_view
group by grouping sets ((county), (city), (county, city))
select county, null, sum(margin) as total
from sale_view
group by county
union all
select null, city, sum('margin') as total
from sale_view
group by city
union all
select county, city, sum('margin') as total
from sale_view
group by county, city
Amazon Redshift released support (in preview) for multi-dimensional aggregate functions! https://aws.amazon.com/about-aws/whats-new/2022/11/amazon-redshift-sql-capabilities-speed-data-warehouse-migrations-preview/
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
Describe the feature
I would like to have a database-agnostic macro mimicking
GROUPING SETS
aggregate function that was initially introduced in Postgres 9 (https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-GROUPING-SETS).The function is super useful when you want to combine multiple GROUP BY clauses in a single GROUP BY clause without the need to write O(N) group by clauses and unioning them together.
Unfortunately, both Redshift and BigQuery do not support
GROUPING SETS
out of the box (as well as other multi-dimensional aggregate functions likeROLLUP
andCUBE
).Example in Postgres SQL:
Example in Redshift SQL:
Additional context
Who will this benefit?
It will be useful for all analytics engineers who work with multi-dimensional reporting sparing them precious time otherwise spent copying
UNION ALL
expressions with minuscule differences over and over again