Open adammarples opened 1 year ago
Hi! Thanks for this interesting feature request.
I have to admit I don't know much about this bit of BigQuery.
My questions would be: is this something that should be dealt with in dbtvault or is it in fact already built into dbt? From the BigQuery dbt configs documentation it does allude to pseudo-columns.
Otherwise, this should already be possible doing something like the following:
WITH stage AS (
{{ dbtvault.stage() }}
),
pseudocolumns AS (
SELECT *, _FILE_NAME AS FILE_NAME
FROM stage
)
SELECT * FROM pseudocolumns
However, this is a little clunky and some kind of config variable as suggested would be a lot cleaner.
Hi. The problem as I see it there (and I am not an expert on dbt-vault or dbt-bigquery or how they interact) is that _FILE_NAME can only be selected directly from the source table, if it is not included in the initial CTE then it will be unavailable going forward. It will also require an alias ie. 'FILE_NAME'.
dbt-vault provides a file called dbtvault/macros/staging/stage.sql
which I believe is what dbt-vault runs when dbtvault.stage()
is called, and the _FILE_NAME needs to be inserted right there at the source.
WITH source_data AS (
SELECT
{{- "\n\n " ~ dbtvault.print_list(list_to_print=all_source_columns, columns_to_escape=columns_to_escape) if all_source_columns else " *" }}
, _FILE_NAME AS FILE_NAME -- <- we need this line optionally
FROM {{ source_relation }}
{%- set last_cte = "source_data" %}
)
I can have a go at a PR
Ok that makes sense, thanks for explaining!
I will add this to our backlog :) Happy for you to do a PR but at this time we cannot accept directly as our test harness is not publicly available and it does need to be run through that before release. We hope to improve contribution guidelines and processes in future.
It would be good to be able to use the _FILE_NAME pseudo-column which bigquery uses to hold the filename of the files in an external table.
Describe the solution you'd like To do this, it would need to be selected in the initial stage select CTE, because it is unavailable after that, ie. when derived columns are added. Presumably the best way to do that is to create a macros/tables/bigquery/stage.sql file with an "external_table_filename" kwarg flag, or if this could be covered with a src_extra_columns variable? There are other pseudo-columns too such as in partitioned tables.
Describe alternatives you've considered
Additional context Happy to work on this if it seems appropriate
AB#5366