Open Aaron-Zhou opened 3 months ago
It would be surprising if the problem was that we weren't feeding the s3 credentials to DuckDB given that so many people run with those settings (that is, I feel like I would have heard that complaint from lots of people if that was broken.)
The most common problem I see with these settings is that the region isn't configured correctly (i.e., the data lives in an S3 bucket defined in a region that is different from the one defined in s3_region
.) Of course, if you have a way to make this work using the CREATE SECRET
construct (which it sounds like you do), you can just run that in a macro that is configured using the on-run-start
hook and be on your way.
@jwills Thanks for your reply, I double checked my s3_region and I am pretty sure the value is correct. I also did some testing by adding below config to write files to s3 bucket, it worked without using CREATE SECRET construct. So it proved that my s3 settings were correct. I also don't believe DuckDB has problem with reading files from s3, but could be something from dbt-duckdb adapter caused s3 settings not proper configured on DuckDB so caused reading from s3 bucket not working. Below is the config I used so write table to external files on S3 bucket. Again, writing worked without the CREATE SECRET construct, but reading was not working.
Model.py
def model(dbt, session):
dbt.config(
materialized="external",
location="s3://bucket/parquet/duckdb-sample/pending_orders.parquet",
format="parquet",
options={"partition_by": "customer_id", "overwrite_or_ignore": True},
glue_register=True,
glue_database="duckdb_sample",
)
orders = (
dbt.source("csv", "orders").filter("status='PENDING_PAYMENT'").set_alias("o")
)
customers = dbt.source("csv", "customers").set_alias("c")
pending_orders = orders.join(customers, "o.customer_id=c.id")
return pending_orders
Python model failed: HTTP Error: HTTP GET error on 'https://bucket.s3.amazonaws.com/ms/distribution/csv/orders.csv' (HTTP 403)
profiles.yml:
sources.yml:
models.py: