Closed muscovitebob closed 1 year ago
Ugh; I was afraid something like this could happen. If you delete the target directory and do a clean run, does it work?
Just cleaned out my target/ and logs/ folders, as well as deleting the actual duckdb file to let it completely regenerate, and unfortunately yes.
Okay, I don't think I can salvage this functionality as-is, which is kind of good news in that I had been feeling for awhile that I should look really hard at moving this functionality into dbt seed
given how it works (i.e., it actually materializes a table in the DuckDB database file).
@muscovitebob really appreciate you trying this out-- I've been distracted by some other things going on right now and hadn't had a chance to really kick the tires here.
Alright, happy to help, thanks for the hard work with this repo and checking this out. For now I'll stick to my original plan of using some python models to fetch BigQuery data which works fine if a bit less elegant than this solution would have been.
@jwills I ran into this issue as well -- just an issue with the compile phase not having an connection just yet. The attached PR is one quick way around it.
It does illustrate one issue with the dbt
testing libraries -- they create the adapter connection up front, hence why the dbt-duckdb functional tests passed.
@AlexanderVR you’re my hero— thank you! I have a separate branch where I’m working on integrating source plugins into dbt seed operations via a hack, but your approach looks much simpler
:-)
But this is still just a quick hack. Because dbt
's Relation class seems really only for keeping track of how relation identifiers should be rendered (should really be called RelationRendering
), loading data into the db when instantiating this class feels a bit off.
I kind of worry that the goal to autopopulate sources using a plugin will prove to be too large an impedence mismatch against how dbt
is designed -- sources are not meant to be maintained by it!
For the dbt seed
approach -- is this to enable populating seeds with significant amounts of data? (So that downstream models use ref()
instead of source()
?)
Mmm, I suppose it depends on what you mean by "significant"-- there are many virtues of using dbt seed
as a general way of loading data into a database prior to a dbt run
compared to the source
hack that I implemented here (e.g., it could take advantage of multiple threads), but I don't think it would ever be a good strategy for moving a very large quantity of data. I was thinking it would be a cool way to move up to like a 1GB or so of data though.
(My general approach to coming up with new features for dbt-duckdb is to read the dbt docs, look for places where they say that such and such is a bad idea, and then try to figure out how to do it.)
Hi, just trying to play with the new plugin system off of master, however I get hit by this error trying to create an SQLAlchemy source table connecting to BigQuery.
My config is as follows:
And I have a source config for one extant table in my BigQuery project that is configured like this:
I try to query the source using a new model:
However the following is thrown:
Could I perhaps be misconfiguring something here?
I do have all the necessary packages installed in my env for SQLAlchemy to make a BQ connection.