Open JRocki opened 3 months ago
The rub here is that the created DuckDB file won't have the S3 access key/secret present to be able to read the tables that are backed by files in S3 (the result of tests are just regular DuckDB tables, which is why you can query them.) You should be able to read the tables by loading the aws
extension and running CALL load_aws_credentials();
to load your credentials.
To avoid needing to do this, you can use the new CREATE SECRET
construct via a dbt macro and then DuckDB will store the credentials in an encrypted format in a special .
directory in home: https://duckdb.org/docs/sql/statements/create_secret.html
@jwills That makes sense, thank you!
I am not sure if this is intentional or if there's something on my end, but I am unable to query tables in the DuckDB file that are created during the DBT run. I am using AWS S3/parquet as my 'data lake'. I'm using
duckdb
v 0.10.0 anddbt-duckdb
v 1.7.2. Here's my profiles.yml:The pipeline runs successfully. I get all the expected parquet files in S3. The curious issue is with the duckdb file.
For example, I have a model that creates a table in a
db.duckdb
file:db.main.my_model
. I also run a DBT test on this model (it checks that values in a column are unique). This creates a tabledb.main_dbt_test__audit.unique_column_id
. I like to then use python/duckdb to directly query the tables when I'm debugging or exploring, but it errors when queryingdb.main.my_model
:yields:
duckdb.duckdb.HTTPException: HTTP Error: HTTP GET error on 'https://my-bucket.s3.amazonaws.com/pipeline/my_model.parquet' (HTTP 403)
However, if I run:
I am able to see the data. Is there something inherently different about tables create via duckdb models? How does the duckdb file know to point to S3? I think the
HTTP 403
is a red herring because the DBT pipeline runs meaning that it is able to connect to AWS. I'd really appreciate any thoughts on this, thanks!