The Bigquery deduplicate macro uses array_agg to deduplicate. The way it is currently set up, any query using the macro will not be able to partition prune downstream of the macro due to the way array_agg interacts with partition pruning because the partition column is not explicitly selected separate from the array agg. This can be avoided in a table by doing the partition filtering in a CTE prior to using the deduplicate macro, but it can't be avoided in a view.
Describe alternatives you've considered
For models materialized as tables, we can add the partition filtering above the deduplication. For views, the alternative is to write the deduplication step ourselves to manual select the partition column outside of the array_agg.
Basically it is written like this:
select unique.*
from (
select
array_agg (
original
order by {{ order_by }}
limit 1
)[offset(0)] unique
from {{ relation }} original
group by {{ partition_by }}
)
and instead it needs to look like this:
select <partition pruning column>, unique.* except(<partition pruning column>)
from (
select
<partition pruning column>,
array_agg (
original
order by {{ order_by }}
limit 1
)[offset(0)] unique
from {{ relation }} original
group by <partition pruning column>, {{ partition_by }}
)
Additional context
This is specific to the BQ deuplicate macro.
Who will this benefit?
BQ users who want to deduplicate in a view.
Are you interested in contributing this feature?
I can create a PR that implements the fix we used.
Describe the feature
The Bigquery deduplicate macro uses
array_agg
to deduplicate. The way it is currently set up, any query using the macro will not be able to partition prune downstream of the macro due to the wayarray_agg
interacts with partition pruning because the partition column is not explicitly selected separate from the array agg. This can be avoided in a table by doing the partition filtering in a CTE prior to using the deduplicate macro, but it can't be avoided in a view.Describe alternatives you've considered
For models materialized as tables, we can add the partition filtering above the deduplication. For views, the alternative is to write the deduplication step ourselves to manual select the partition column outside of the
array_agg
.Basically it is written like this:
and instead it needs to look like this:
Additional context
This is specific to the BQ deuplicate macro.
Who will this benefit?
BQ users who want to deduplicate in a view.
Are you interested in contributing this feature?
I can create a PR that implements the fix we used.