Open sarah-tuva opened 2 months ago
@sarah-tuva great callout! Before I get into the nitty gritty, I want to mention that:
.csv
seed table via a Snowflake data share. Seeds are sort of like sources, right? Not products we want to share? Change my view!Seed is a strange fruit compared to it's materialization brethren.
Table and Incremental models materializations share use of a single macro to make tables: snowflake__create_table_as()
which uses the copy_grants
config
However, Snowflake's seed materialization is actually dbt's default implementation, therefore does not use make use of copy_grants
at all. Specifically it uses: default__create_csv_table()
{% set sql %}
create table {{ this.render() }} (
{%- for col_name in agate_table.column_names -%}
{%- set inferred_type = adapter.convert_type(agate_table, loop.index0) -%}
{%- set type = column_override.get(col_name, inferred_type) -%}
{%- set column_name = (col_name | string) -%}
{{ adapter.quote_seed_column(column_name, quote_seed_column) }} {{ type }} {%- if not loop.last -%}, {%- endif -%}
{%- endfor -%}
)
{% endset %}
If you really wanted to get this working, you could do so by introducing a new macro called snowflake__create_csv_table()
in your projects macros/
directory that looks almost exactly like but the default__
version but instead has the following lines peppered in
-- before DDL
{%- set copy_grants = config.get('copy_grants') -%}
-- before the line starting with "as (sql)"
{% if copy_grants is not none -%}
copy grants
{%- endif -%}
Hi @dataders! Thanks for providing this background and a potential workaround. We'll try to implement this and let you know how it goes.
For us, sharing seeds is a huge part of what we do. It is one of our key products. We are building open-source healthcare data analytics tools. Part of what we do is create and maintain a set of healthcare-specific terminology seed files. In most cases, our users can just install the Tuva package and run it in their own environment. In some cases, though, we manage this for customers and need to be able to share it back with them. Hence, the need to share seed files as part of the full analytics suite. Feel free to check out our repo.
Hello! I just wanted to report back that I was able to create a workaround for this. Thank you so much for leading me to the right place. Through testing, I figured out that the default reset_csv_table()
macro, which is triggered by using the --full-refresh
flag was dropping the seed table and then inserting the CSV data. This means the grants for the data share were being dropped as well. I ended up having to create two new versions of the default macros, create_csv_table()
and reset_csv_table()
.
Here is how the fix works.
models:
+copy_grants: true
seeds: +copy_grants: true
2. Create two custom macros and add them to the project's `macros/` folder:
{% macro snowflake__create_csv_table(model, agate_table) %} {%- set column_override = model['config'].get('column_types', {}) -%} {%- set quote_seed_column = model['config'].get('quote_columns', None) -%} {%- set copy_grants = model['config'].get('copy_grants') -%}
{% set sql %} create or replace table {{ this.render() }} ( {%- for col_name in agate_table.column_names -%} {%- set inferred_type = adapter.convert_type(agate_table, loop.index0) -%} {%- set type = column_override.get(col_name, inferred_type) -%} {%- set column_name = (col_name | string) -%} {{ adapter.quote_seed_column(column_name, quote_seed_column) }} {{ type }} {%- if not loop.last -%}, {%- endif -%} {%- endfor -%} ) {% if copy_grants and not temporary -%} copy grants {%- endif %} {% endset %}
{% call statement('_') -%} {{ sql }} {%- endcall %}
{{ return(sql) }} {% endmacro %}
{% macro snowflakereset_csv_table(model, full_refresh, old_relation, agate_table) %} {% set sql = "" %} {% if full_refresh %} {% set sql = snowflakecreate_csv_table(model, agate_table) %} {% else %} {{ adapter.truncate_relation(old_relation) }} {% set sql = "truncate table " ~ old_relation %} {% endif %}
{{ return(sql) }}
{% endmacro %}
I would be happy to create a PR if you think this fix should go in the adapter. It might only be an issue for us since we do share seeds with Snowflake data share.
great to hear that you're unblocked and that I pointed you in a helpful direction.
You can certainly open a PR, however, given that we're not certain if there's other's who would benefit, it doesn't strike me as a priority to get merged.
In the meantime, we can leave this issue open for a while to see if there's interest from others.
Is this a new bug in dbt-snowflake?
Current Behavior
We added the
copy_grants: true
config to our dbt_project.yml. We are setting this option for the base project and a package. We also have a data share set up in Snowflake to share the tables created by the dbt project with another Snowflake account.(note: we also tried just setting the copy_grants variable at the highest levels of the models and seeds.)
When we run
dbt build --full-refresh
, the model tables are still shared, but the seed tables become unshared (i.e. the seed tables become unchecked in the Snowflake data share settings and we have to reshare them). When we rundbt build
, we do not see this problem.Expected Behavior
Both the model and seed tables should be fully refreshed and stay shared in the Snowflake data share. We should not have to re-share them each time we run
dbt build --full-refresh
.Steps To Reproduce
dbt build
dbt build --full-refresh
Relevant log output
No response
Environment
Additional Context
No response