Open sacundim opened 1 year ago
huh, k-- these kinds of issues are very hard for me to debug since I don't have ready access to the environment in question; I think the best bet here is to open a python shell/run a simple script that calls the relevant function in dbt-duckdb (which is defined here) and see if we can deduce where the error is coming from, e.g.:
import dbt.adapters.duckdb.credentials as creds
creds._load_aws_credentials()
...working on it, I've put together a simple Docker image to try out your approach, gotta get it running in AWS Batch to do the real deal
Running on Batch prints out a dict with keys s3_access_key_id
, s3_secret_access_key
, s3_session_token
and s3_region
. The values are sensitive so I obv can't share them. I did launch a duckdb 0.8.1 manually outside of AWS, did the corresponding SET statements, and I can query from there, so it's something in between. I'll try to extend my Python program somehow to test out more of the bits in between those two parts that work.
I tried the following inside a Fargate container:
import dbt.adapters.duckdb.credentials as creds
import duckdb
credentials = creds._load_aws_credentials()
print(f'credentials keys = {credentials.keys()}')
connection = duckdb.connect()
cursor = connection.cursor()
cursor.execute('INSTALL httpfs')
cursor.execute('LOAD httpfs')
for key, value in credentials.items():
cursor.execute(f"SET {key} = '{value}'")
...and ran a query like the one my DBT project gets the error for, but it works fine. Maybe elsewhere the adapter is doing something that interferes with this? I looked at e.g. DuckDBConnectionWrapper
but I can't spot anything untoward.
Hrm-- maybe related to this? https://github.com/duckdb/duckdb/issues/6563
My apologies, turns out my reproduction efforts failed to reproduce one of the elements of the original failure: the jobs with the errors are running in ECS cluster with EC2 nodes, but my earlier reproduction attempts ran in Fargate.
I see this perhaps crucial difference:
_load_aws_credentials()
call returns four keys: s3_access_key_id
, s3_secret_access_key
, s3_session_token
, and s3_region
s3_region
is missing!And I can reproduce the HTTP 400 outside of AWS by not setting the s3_region.
Ah, good to know-- and nice detective work!
Thinking I should add some logging in that _load_aws_credentials
function to note which keys were set via the sts token call (tho obviously not the values) to help future folks track down these kinds of problems
...and also that it's possible that this extension may run into some of the same issues: https://github.com/duckdblabs/duckdb_aws
I've just confirmed a working workaround for the issue:
use_credential_provider: aws
settings:
# In theory this shouldn't be necessary:
s3_region: "{{ env_var('S3_REGION') }}"
...and also that it's possible that this extension may run into some of the same issues: https://github.com/duckdblabs/duckdb_aws
Actually I think we have a bug in the httpfs extension here. Its requests to the S3 endpoint are erroring with inscrutable errors in a scenario where other tools—most notably example boto3 the official AWS CLI—work fine. I wonder e.g. if it's sending an empty string for the region when it's supposed to either send none or send a valid one.
When trying to use the
aws
target in the linked profile either from a ECS container or an EC2 instance that's known to have the correct permissions, we get nevertheless an HTTP 400 error:But if in the same EC2 instance I instead configure it this way, with credentials I get from
aws sts get-session-token
, it works: