Closed darrylng closed 2 years ago
@darrylng Thanks for opening! I'm going to transfer this issue over to the dbt-bigquery
repo, since that's where the code change will need to happen. There are a few pieces to this, which I'll try to walk through below.
Generally, labels
are added as part of the create table as
+ create view as
statements, via the options clause:
However, when BigQuery creates a seed, it doesn't use a create table as
statement. (The bigquery__create_csv_table
macro exists, since it's called by the seed
materialization, but it doesn't do anything.) Instead, we use the load_dataframe
method, which ultimately calls the BigQuery Python client's load_table_from_file
method.
After that method is called, we do run some alter
statements, if persist_docs
is enabled, for the purposes of persisting descriptions as table-level and column-level comments:
So I think the approach here could look like:
alter table {{ model }} set {{ bigquery_table_options(config, model, temporary) }}
within bigquery__load_csv_rows
, after load_dataframe
labels
(and other table options) within the Python code of load_dataframe
Personally, I much prefer the first option! I think this could be a straightforward addition.
I don't see any great integration tests in the plugin today for validating that labels have been applied to models configured with labels
, but it should be as simple as a query to the information schema to verify the presence of the expected label: https://cloud.google.com/bigquery/docs/information-schema-tables#table_options_view
With that, I'm going to transfer and mark this one a good first issue
.
Awesome! Thanks @jtcohen6 for your response, I will try and tackle it when I have some time.
@jtcohen6 I have opened a PR to address this issue, will appreciate your review and feedback.
I also noticed that the .changes
and changie.yaml
files are not committed to this repo, so the CHANGELOG.md
was manually edited instead of generated by changie
.
Is there an existing feature request for this?
Describe the Feature
Current models support
+labels:
config as follows:However, seeded models do not currently support this, at least not that I am aware. Having the
+labels:
config underseeds:
do not currently result in labels being set when runningdbt seed
to the database.Describe alternatives you've considered
I have written a macro that is always run as a post-hook for all seeded models.
And in
dbt_project.yml
, I attach the macro as a post-hook to all models in seeds config:Who will this benefit?
BigQuery users who need to seed data into the database but also need labels applied.
Are you interested in contributing this feature?
Yes, if this is a feature that is wanted.
Anything else?
No response