duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
881 stars 78 forks source link

Writing to an SQLite database #291

Closed Nintorac closed 10 months ago

Nintorac commented 10 months ago

Hey, great project love the versatility it offers!

I am having an issue trying to write to a SQLite database, I have the following configuration, and I get this error SQLite databases do not support creating new schemas.

Am I doing something stupid or is this not possible to do?

Thanks!

dbt_project.yml

models:
  raddd_dbt:
    +materialized: table
    render_midis:
      +materialized: incremental
      +database: audio

profiles.yml

raddd_dbt:
  outputs:
    dev:
      type: duckdb
      path: ../data/dev.duckdb
      threads: 1
      attach:
        - path: ../data/audio.db
          type: sqlite
  target: dev
jwills commented 10 months ago

Yeah my guess here is that dbt-duckdb is trying to create a schema in the SQLite database that it is writing to, since that is usually what it does when it’s writing a DuckDB model. There is almost certainly a way to turn that off and have dbt-duckdb just write the database + table name, but I’m on the road and I’m not sure I could figure it out from a phone— sorry!

Nintorac commented 10 months ago

Hmm, ok thanks! I will keep digging :)

Enjoy your road trip!! (if that's why you're on the road)

jwills commented 10 months ago

Yeah looked at this a bit more now that I'm at a computer and there doesn't seem to be a great way to do it-- i.e., sqlite never wants a schema, and dbt always wants a schema on a table. Maybe try creating a view model in DuckDB and then adding a post-hook that does the CREATE TABLE <sqlitedb>.<table_name> AS SELECT * FROM {{ this }} trick?

jwills commented 10 months ago

the only other way I could think to do it is with a custom materialization, but that seems like a lot of work

Nintorac commented 10 months ago

Thanks for looking into it!

adding a post-hook

for my situation this won't work, the model is Python based so only allows incremental or table and the model is too large to create in a single call so I need incremental

custom materialization

yea, that does seem like a lot of work, especially since you would probably need to implement one for every "real" DBT materialisation method.


I'm actually a bit confused about why it would be trying to create the schema in the first place, I would have thought by deafult that it would be trying to write to the main schema which should already exist so creation should not be necessary.

I tried forcing it to use main by modifying the config like this

    render_midis:
      +materialized: incremental
      +database: audio
      +schema: main

But that results in the same error.

I then made a downstream dummy SQL table that looks like this

SELECT * from {{ref("render_midis")}} 

Then compiled it, and got this as a result

SELECT * from "audio"."main_main"."render_midis"

Then just to see what would happen I also tried modifying the schema to something else, audio in my test, so the config was like this

    render_midis:
      +materialized: incremental
      +database: audio
      +schema: audio

That compiles to

SELECT * from "audio"."main_audio"."render_midis"

Not sure what to make of that myself, any ideas?

Nintorac commented 10 months ago

Ooh, now I tried with this config (deleted the schema setting)

    render_midis:
      +materialized: incremental
      +database: audio

and the result is

SELECT * from "audio"."main"."render_midis"

Then running this query against DuckDB will successfully create the table

ATTACH 'audio.db' (TYPE sqlite);
CREATE TABLE "audio"."main"."hello" AS SELECT 1

So maybe there just needs to be a conditional here that checks if the target is a SQLite and skips the call if so. Maybe also it should assert that the schema option is unset?

And the compilation SELECT * from "audio"."main_main"."render_midis" is expected DBT behaviour ..lol I think I knew that at one point 😅

jwills commented 10 months ago

Ah, the dbt custom schema mistake strikes again! I should have known! 😉

jwills commented 10 months ago

(Like any useful piece of technology, dbt got 95% of things right that you never notice/think about, and 5% of things wrong that cause pain and suffering for everyone forever)

Nintorac commented 10 months ago

So this seems to do the job :)

My first jinja script so open to any comments/criticisms. Happy to open a PR too

{% macro duckdb__create_schema(relation) -%}
  {%- call statement('create_schema') -%}
    {% set sql %}
        select type from duckdb_databases()
        where database_name='{{ relation.database }}'
        and type='sqlite'
    {% endset %}
    {% set results = run_query(sql) %}
    {% if results|length == 0 %}
        create schema if not exists {{ relation.without_identifier() }}
    {% else %}
        {% if relation.schema!='main' %}
            {{ exceptions.raise_compiler_error(
                "Schema must be 'main' when writing to sqlite "
                ~ "instead got " ~ relation.schema
            )}}
        {% endif %}
    {% endif %}
  {%- endcall -%}
{% endmacro %}