duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
816 stars 69 forks source link

Exception: DuckDBConnectionManager environment requested before creation! when using SQLAlchemy plugin #176

Closed muscovitebob closed 1 year ago

muscovitebob commented 1 year ago

Hi, just trying to play with the new plugin system off of master, however I get hit by this error trying to create an SQLAlchemy source table connecting to BigQuery.

My config is as follows:

duckdb_project:
  outputs:
    dev:
      type: duckdb
      path: data/duckdb_project.duckdb
      database: duckdb_project
      plugins:
        - module: sqlalchemy
          alias: bq_sql
          config:
            connection_url: "bigquery://my-gcp-project"

And I have a source config for one extant table in my BigQuery project that is configured like this:

version: 2

sources:
  - name: MyDataset
    tables:
      - name: MyTable
        identifier: my_table
        config:
          plugin: bq_sql
          save_mode: overwrite
          query: "SELECT * FROM MyDataset.{identifier}"

I try to query the source using a new model:

SELECT
   *
FROM
    {{ source(
        "MyDataset",
        "MyTable"
    ) }}

However the following is thrown:

10:03:21  Running with dbt=1.5.1
10:03:22  Unable to do partial parsing because profile has changed
10:03:36  Encountered an error:
DuckDBConnectionManager environment requested before creation!
10:03:36  Traceback (most recent call last):
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/cli/requires.py", line 86, in wrapper
    result, success = func(*args, **kwargs)
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/cli/requires.py", line 71, in wrapper
    return func(*args, **kwargs)
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/cli/requires.py", line 142, in wrapper
    return func(*args, **kwargs)
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/cli/requires.py", line 168, in wrapper
    return func(*args, **kwargs)
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/cli/requires.py", line 215, in wrapper
    return func(*args, **kwargs)
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/cli/requires.py", line 242, in wrapper
    manifest = ManifestLoader.get_full_manifest(
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/parser/manifest.py", line 227, in get_full_manifest
    manifest = loader.load()
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/parser/manifest.py", line 416, in load
    patcher.construct_sources()
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/parser/sources.py", line 84, in construct_sources
    parsed = self.parse_source(patched)
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/parser/sources.py", line 188, in parse_source
    parsed_source.relation_name = self._get_relation_name(parsed_source)
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/parser/sources.py", line 291, in _get_relation_name
    return str(relation_cls.create_from(self.root_project, node))
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/adapters/base/relation.py", line 259, in create_from
    return cls.create_from_source(node, **kwargs)
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/adapters/duckdb/relation.py", line 26, in create_from_source
    DuckDBConnectionManager.env().load_source(plugin_name, source_config)
  File "/Users/databob/opt/miniconda3/envs/py39/lib/python3.9/site-packages/dbt/adapters/duckdb/connections.py", line 27, in env
    raise Exception("DuckDBConnectionManager environment requested before creation!")
Exception: DuckDBConnectionManager environment requested before creation!

Could I perhaps be misconfiguring something here?

I do have all the necessary packages installed in my env for SQLAlchemy to make a BQ connection.

jwills commented 1 year ago

Ugh; I was afraid something like this could happen. If you delete the target directory and do a clean run, does it work?

muscovitebob commented 1 year ago

Just cleaned out my target/ and logs/ folders, as well as deleting the actual duckdb file to let it completely regenerate, and unfortunately yes.

jwills commented 1 year ago

Okay, I don't think I can salvage this functionality as-is, which is kind of good news in that I had been feeling for awhile that I should look really hard at moving this functionality into dbt seed given how it works (i.e., it actually materializes a table in the DuckDB database file).

@muscovitebob really appreciate you trying this out-- I've been distracted by some other things going on right now and hadn't had a chance to really kick the tires here.

muscovitebob commented 1 year ago

Alright, happy to help, thanks for the hard work with this repo and checking this out. For now I'll stick to my original plan of using some python models to fetch BigQuery data which works fine if a bit less elegant than this solution would have been.

AlexanderVR commented 1 year ago

@jwills I ran into this issue as well -- just an issue with the compile phase not having an connection just yet. The attached PR is one quick way around it.

It does illustrate one issue with the dbt testing libraries -- they create the adapter connection up front, hence why the dbt-duckdb functional tests passed.

jwills commented 1 year ago

@AlexanderVR you’re my hero— thank you! I have a separate branch where I’m working on integrating source plugins into dbt seed operations via a hack, but your approach looks much simpler

AlexanderVR commented 1 year ago

:-) But this is still just a quick hack. Because dbt's Relation class seems really only for keeping track of how relation identifiers should be rendered (should really be called RelationRendering), loading data into the db when instantiating this class feels a bit off.

I kind of worry that the goal to autopopulate sources using a plugin will prove to be too large an impedence mismatch against how dbt is designed -- sources are not meant to be maintained by it!

For the dbt seed approach -- is this to enable populating seeds with significant amounts of data? (So that downstream models use ref() instead of source()?)

jwills commented 1 year ago

Mmm, I suppose it depends on what you mean by "significant"-- there are many virtues of using dbt seed as a general way of loading data into a database prior to a dbt run compared to the source hack that I implemented here (e.g., it could take advantage of multiple threads), but I don't think it would ever be a good strategy for moving a very large quantity of data. I was thinking it would be a cool way to move up to like a 1GB or so of data though.

(My general approach to coming up with new features for dbt-duckdb is to read the dbt docs, look for places where they say that such and such is a bad idea, and then try to figure out how to do it.)