dbt-labs / dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
https://getdbt.com
Apache License 2.0
258 stars 158 forks source link

[Feature] copy_grants option for seeds (for use in Snowflake Data Share) #1030

Open sarah-tuva opened 2 months ago

sarah-tuva commented 2 months ago

Is this a new bug in dbt-snowflake?

Current Behavior

We added the copy_grants: true config to our dbt_project.yml. We are setting this option for the base project and a package. We also have a data share set up in Snowflake to share the tables created by the dbt project with another Snowflake account.

models:
  +copy_grants: true
  the_tuva_project:
    +copy_grants: true

seeds:
  +copy_grants: true
  the_tuva_project:
    terminology:
      +copy_grants: true
    value_sets:
      +copy_grants: true

(note: we also tried just setting the copy_grants variable at the highest levels of the models and seeds.)

When we run dbt build --full-refresh, the model tables are still shared, but the seed tables become unshared (i.e. the seed tables become unchecked in the Snowflake data share settings and we have to reshare them). When we run dbt build, we do not see this problem.

Expected Behavior

Both the model and seed tables should be fully refreshed and stay shared in the Snowflake data share. We should not have to re-share them each time we run dbt build --full-refresh.

Steps To Reproduce

  1. Add the copy_grants config to the dbt_project.yml
  2. Run dbt build
  3. Create a data share in Snowflake and share the tables created by models and seeds
  4. Run dbt build --full-refresh
  5. The seed tables become unchecked and are no longer shared, the model tables are still shared

Relevant log output

No response

Environment

- OS: macOS-14.4.1-arm64-arm-64bit
- Python: 3.9.6
- dbt-core: 1.7.4
- dbt-snowflake: 1.7.1

Additional Context

No response

dataders commented 1 month ago

@sarah-tuva great callout! Before I get into the nitty gritty, I want to mention that:

  1. a valid workaround would be to make a wrapper view or table model of this seed table and share that instead
  2. I'm especially interested in this workaround because I don't understand the use case for sharing a .csv seed table via a Snowflake data share. Seeds are sort of like sources, right? Not products we want to share? Change my view!

nitty gritty

Seed is a strange fruit compared to it's materialization brethren.

Table and Incremental models materializations share use of a single macro to make tables: snowflake__create_table_as() which uses the copy_grants config

https://github.com/dbt-labs/dbt-snowflake/blob/5459fbf0aa1bb40e800b89200442fcab47504157/dbt/include/snowflake/macros/relations/table/create.sql#L35-L37

However, Snowflake's seed materialization is actually dbt's default implementation, therefore does not use make use of copy_grants at all. Specifically it uses: default__create_csv_table()

  {% set sql %}
    create table {{ this.render() }} (
        {%- for col_name in agate_table.column_names -%}
            {%- set inferred_type = adapter.convert_type(agate_table, loop.index0) -%}
            {%- set type = column_override.get(col_name, inferred_type) -%}
            {%- set column_name = (col_name | string) -%}
            {{ adapter.quote_seed_column(column_name, quote_seed_column) }} {{ type }} {%- if not loop.last -%}, {%- endif -%}
        {%- endfor -%}
    )
  {% endset %}

If you really wanted to get this working, you could do so by introducing a new macro called snowflake__create_csv_table() in your projects macros/ directory that looks almost exactly like but the default__ version but instead has the following lines peppered in

-- before DDL
{%- set copy_grants = config.get('copy_grants') -%}

-- before the line starting with "as (sql)"
{% if copy_grants is not none -%}
   copy grants
{%- endif -%}
sarah-tuva commented 1 month ago

Hi @dataders! Thanks for providing this background and a potential workaround. We'll try to implement this and let you know how it goes.

For us, sharing seeds is a huge part of what we do. It is one of our key products. We are building open-source healthcare data analytics tools. Part of what we do is create and maintain a set of healthcare-specific terminology seed files. In most cases, our users can just install the Tuva package and run it in their own environment. In some cases, though, we manage this for customers and need to be able to share it back with them. Hence, the need to share seed files as part of the full analytics suite. Feel free to check out our repo.

sarah-tuva commented 1 month ago

Hello! I just wanted to report back that I was able to create a workaround for this. Thank you so much for leading me to the right place. Through testing, I figured out that the default reset_csv_table() macro, which is triggered by using the --full-refresh flag was dropping the seed table and then inserting the CSV data. This means the grants for the data share were being dropped as well. I ended up having to create two new versions of the default macros, create_csv_table() and reset_csv_table().

Here is how the fix works.

  1. Add the copy_grants config to the dbt_project.yml:
    
    models:
    +copy_grants: true

seeds: +copy_grants: true

2. Create two custom macros and add them to the project's `macros/` folder:

{% macro snowflake__create_csv_table(model, agate_table) %} {%- set column_override = model['config'].get('column_types', {}) -%} {%- set quote_seed_column = model['config'].get('quote_columns', None) -%} {%- set copy_grants = model['config'].get('copy_grants') -%}

{% set sql %} create or replace table {{ this.render() }} ( {%- for col_name in agate_table.column_names -%} {%- set inferred_type = adapter.convert_type(agate_table, loop.index0) -%} {%- set type = column_override.get(col_name, inferred_type) -%} {%- set column_name = (col_name | string) -%} {{ adapter.quote_seed_column(column_name, quote_seed_column) }} {{ type }} {%- if not loop.last -%}, {%- endif -%} {%- endfor -%} ) {% if copy_grants and not temporary -%} copy grants {%- endif %} {% endset %}

{% call statement('_') -%} {{ sql }} {%- endcall %}

{{ return(sql) }} {% endmacro %}

{% macro snowflakereset_csv_table(model, full_refresh, old_relation, agate_table) %} {% set sql = "" %} {% if full_refresh %} {% set sql = snowflakecreate_csv_table(model, agate_table) %} {% else %} {{ adapter.truncate_relation(old_relation) }} {% set sql = "truncate table " ~ old_relation %} {% endif %}

{{ return(sql) }}

{% endmacro %}



I would be happy to create a PR if you think this fix should go in the adapter. It might only be an issue for us since we do share seeds with Snowflake data share.
dataders commented 1 month ago

great to hear that you're unblocked and that I pointed you in a helpful direction.

You can certainly open a PR, however, given that we're not certain if there's other's who would benefit, it doesn't strike me as a priority to get merged.

In the meantime, we can leave this issue open for a while to see if there's interest from others.