dremio / dbt-dremio

dbt (data build tool) adapter for the Dremio
Apache License 2.0
44 stars 21 forks source link

Allow for setting of custom schema for view pointing to materialized tables #240

Open ethanberrett opened 1 month ago

ethanberrett commented 1 month ago

Describe the enhancement requested

When a materialized table is created, a corresponding Dremio view is created in dremio_space_folder. However, there is no way to customize where in Dremio that view gets saved. I have overwritten the generate_schema_name built-in DBT macro to the following:

{% macro generate_schema_name(
        custom_schema_name,
        node
    ) -%}
    {% set schema = node.original_file_path.split('/') [1:-1] | join('.') %}
    {%- if node.config.materialized == 'table' -%}
        {%- set s3_path = target.object_storage_path -%}
        {%- set schema = s3_path ~ "." ~ node.fqn [1:-1] | join('.') -%}
    {% endif %}
    {{ schema | trim }}
{%- endmacro %}

This lets me have my Dremio space and the folders therein match my models folder in my repo. The same goes for the object storage (S3 in my case).

However, the dbt-dremio package automatically creates a view from any materialized tables in the dremio_space_folder specified in profiles.yml. This cannot be overwritten via macro.

I would like there to be a dynamic way that you can specify where views pointing at materialized tables are saved in Dremio. I believe the simplest way to do this would to be have custom generate_schema_name macros ALSO apply to the schema of the view pointing at a materialized table, not just pure views.

Justification for this enhancement

Overwriting the generate_schema_name built-in macro is a ubiquitous use-case for DBT implementations. Being unable to likewise specify where views pointing to tables are saved means a user must manually specify schemas for each view corresponding to a materialized table, which is cumbersome and difficult to maintain with large model folders.

Being able to dynamically set locations for views pointing at materialized tables means a much more friendly experience for developers/maintainers of a DBT core repo.