Open christopherekfeldt opened 5 months ago
@christopherekfeldt Thanks for the report!
The mechanism we're using for --empty
is to wrap the source()
and ref()
calls in a subquery with select * ... where false limit 0
. This *
doesn't pass along pseudo-columns.
The first idea that came to mind:
_PARTITIONTIME
) will still failselect *, _PARTITIONTIME as partition_time
from dbt_jcohen.myingestiontable
where false limit 0
Other ideas:
where false limit 0
without wrapping in a subquery (but this won't play nice with other where
statements, unnest
, etc)source()
from the default --empty
subquery, but access flags.EMPTY
to apply your own conditional filterIn the meantime, you can at least avoid the error by specifying .render()
on any refs/sources that you don't want dbt to turn into where false limit 0
subqueries.
If we added support for flags.EMPTY
, then you could write something like:
{% set src_cpc_raw = source('customer_preference_center', 'customer_preference') -%}
select
...,
_PARTITIONTIME as ingestion_dt
from
{{ src_cpc_raw.render() }}, -- this will be rendered simply into `project.dataset.identifier` (no subquery)
unnest(centralPreferences) as centralPreferences
where 1=1
{% if flags.EMPTY %}
and false limit 0 -- instead, I manually add the "empty limit" here
{% endif %}
{% if is_incremental() %}
and date(ingestion_dt) >= date_sub("{{ latest_partition_filter(src_cpc_raw) }}", interval 1 day)
{% endif %}
qualify row_number() over (partition by customerId_token, preference order by updateTS desc, ingstn_ts desc) = 1
My suggestion to solve this issue is related to https://github.com/dbt-labs/dbt-core/discussions/8560: we need to be able to override the rendering from sources/refs.
For sources, we could have a way to add a parameter to the macro to add those metadata fields and for refs, since it would be related to "time_ingestion_partitioning": True,
, we should be able to detect them by ourselves.
Is this a new bug in dbt-bigquery?
Current Behavior
When trying out the empty flag on my models I get failures on all models that uses the pseudo-column "_PARTITIONTIME" in their logic. Here is my query, it has worked perfectly fine prior.
But now it has swapped out the logic with a subquery that doesn't take the pseudo column into consideration:
Giving the error in BigQuery: "Unrecognized name: _PARTITIONTIME at [37:9]"
Expected Behavior
I expect the subquery to work with pseudo-columns as well.
Steps To Reproduce
Relevant log output
No response
Environment
Additional Context
No response