duckdb / duckdb-wasm

WebAssembly version of DuckDB
https://shell.duckdb.org
MIT License
1.31k stars 132 forks source link

CORS error when access parquet from s3 #838

Open hustnn opened 2 years ago

hustnn commented 2 years ago

I am using https://shell.duckdb.org/ to read a parquet from S3.

SET s3_region='us-east-1';
SET s3_access_key_id='xxx';
SET s3_secret_access_key='xxx';
select count(*) from read_parquet('s3://xxx/part-1.gz.parquet');

I also followed https://docs.aws.amazon.com/AmazonS3/latest/userguide/enabling-cors-examples.html to set CORS for S3. However, I still got the error

duckdb-browser-eh.worker.9db3d2ba08abf891a1f4.js:2 Access to XMLHttpRequest at 'https://eth-chain.s3.amazonaws.com/xxx/part-1.gz.parquet' from origin 'https://shell.duckdb.org' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource.

Mause commented 2 years ago

Are you able to see what the actual response is in the network tab in your browsers devtools? You might be actually getting an auth error, for example

hustnn commented 2 years ago
image

image

image

hustnn commented 2 years ago

@Mause I attached the response above.

Mause commented 2 years ago

@hustnn as you can see in the screenshot, your actual issue is an authentication one. I would suggest making sure your aws credentials can actually access that s3 bucket

hustnn commented 2 years ago

@Mause Actually I can access the S3 the same way (set region, s3_access_key_id , and s3_secret_access_key) with duckdb cli so I think s3 is accessible. Are you able to access the parquet in s3 from https://shell.duckdb.org/?

Mause commented 2 years ago

Perhaps verify that the headers being sent are the expected ones in the network console? And that you have declared those headers correctly in your S3 CORS config?

tobilg commented 2 years ago

I currently have the exact issue @hustnn. I see swallowed auth headers for the OPTIONS request, and consequently a 403. Have you been able to solve this?

Mause commented 2 years ago

I currently have the exact issue @hustnn. I see swallowed auth headers for the OPTIONS request, and consequently a 403. Have you been able to solve this?

Before you can make cross origin requests to your bucket, you have to add cors config: https://docs.aws.amazon.com/AmazonS3/latest/userguide/ManageCorsUsing.html

tobilg commented 2 years ago

Yes, I did. I experimented with different settings regarding allowed methods and headers. The only way I could get it to work was adding * to both, which isn’t ideal but works for now. Thanks!

chriszrc commented 1 year ago

Oddly, I couldn't get this to work with the wasm shell, but once I installed duckdb wasm lib in my local app it worked just fine with the same s3 parquet files-

medeirosjoaquim commented 7 months ago

Getting same error on local minio. Allowed all methods and '*' on the config and still getting CORS blocked with same error as the first comment

dude0001 commented 7 months ago

Getting same error on local minio. Allowed all methods and '*' on the config and still getting CORS blocked with same error as the first comment

Can you share your config @medeirosjoaquim ?

Haklim733 commented 3 months ago

@dude0001 I am running into the same issue even making my bucket public and setting up cors to be allow all origins. I am currently on duckdb-wasm 1.28.1-dev106.0. For my error, I always get the following: FAIL WITH: HEAD and GET requests failed: Doesn't work on shell.duckdb.org either. Let me know how I can help debug.

dude0001 commented 3 months ago

@dude0001 I am running into the same issue even making my bucket public and setting up cors to be allow all origins. I am currently on duckdb-wasm 1.28.1-dev106.0. For my error, I always get the following: FAIL WITH: HEAD and GET requests failed: Doesn't work on shell.duckdb.org either. Let me know how I can help debug.

@Haklim733 what do you see in the S3 access logs? Can you share your exact CORS policy?

Haklim733 commented 3 months ago

@dude0001 i made a silly mistake and didn't set the bucket policy to allow s3:ListBuckets. Works now.