Open coji opened 1 month ago
Using curl, in native:
% curl -I 'https://91ff95bcb91fbfa1b1c5c356262b1fe4.r2.cloudflarestorage.com/techtalk/world_populations.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=0d9126cf0fed3ae3c00f20ceb2bb97c3%2F20240914%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20240914T091120Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=1bddf8fcc77e83aa20ffa827e771cea7310af373354af06c5ac58f2e181f0182'
HTTP/1.1 403 Forbidden
Date: Sat, 14 Sep 2024 11:09:08 GMT
Content-Type: text/plain;charset=UTF-8
Connection: keep-alive
Server: cloudflare
CF-RAY: 8c2fec512cac66ea-AMS
So I would interpret that as the HEAD request not being valid at all, it's not a CORS problem but a problem with presigned URLs that I think do allow only GET methods (I think, unsure, this one for sure).
I would need to check this in the shell, sounds possible an error should be dealt with better and recover from it
I'm having the same issue with R2. I think it might be related to the presigned URL only being valid for whatever method you've specified to access the data. In duckdb wasm when we read or attach to a file in most cases a HEAD and then a GET request are sent. There's a few other issues talking about this https://github.com/duckdb/duckdb-wasm/issues/699. Curious if there's any solution?
Sounds like a possible work around is to create a modified version which sends a ranged GET request first instead of the HEAD request. At which point only GET requests would be sent.
Maybe check the following: https://stackoverflow.com/a/76836943
Also, can you access the presigned URL via curl w/o Problem?
@tobilg can't curl mine either... sorry if URL expires.
curl -I "https://01bec96ddf135b4f6636692059641ffe.r2.cloudflarestorage.com/scrapeable-data/db_test.duckdb?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=f7f92bbce532f6ead44c019c43921a18%2F20240914%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20240914T184231Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=486f72d518388b248d5d9955743ab7a5eb2023c5ef9723c0d5c6afa132b923ac"
HTTP/1.1 403 Forbidden
Date: Sat, 14 Sep 2024 18:45:21 GMT
Content-Type: text/plain;charset=UTF-8
Connection: keep-alive
Server: cloudflare
CF-RAY: 8c3288989eee8fb8-BOS
Maybe I need to get rid of special characters in my bucket and .duckdb in my object.
It looks like this PR should've fixed the issue https://github.com/duckdb/duckdb-wasm/issues/1367 to https://github.com/duckdb/duckdb-wasm/pull/1717
Have you checked adding the Content-Range
header, as outline https://github.com/duckdb/duckdb-wasm/pull/1717#issuecomment-2090268399
Have you checked adding the
Content-Range
header, as outline #1717 (comment)
Thank you for your confirmation. I added the following setting to CORS:
"ExposeHeaders": ["*"]
but it still results in an error at the time of the HEAD request. I've also updated the issue itself with the above information.
this is huge!
@coji I tried out this example app from @e1arikawa using your above example but still cannot get it to work. So maybe it is a CORS issue, or this app is incomplete for testing this, or the PR doesn't work for our desired use case?
Either way maybe give it a try too and see if you can query your S3 parquet file. I still see
duckdb-browser-eh.worker.js:1 Access to XMLHttpRequest at 'https://91ff95bcb91fbfa1b1c5c356262b1fe4.r2.cloudflarestorage.com/techtalk/world_populations.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=0d9126cf0fed3ae3c00f20ceb2bb97c3%2F20240914%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20240914T091120Z&X-Amz-Expires=604800&X-Amz-SignedHeaders=host&X-Amz-Signature=1bddf8fcc77e83aa20ffa827e771cea7310af373354af06c5ac58f2e181f0182' from origin 'http://127.0.0.1:5500' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.
What happens?
When executing a SELECT statement in DuckDB-Wasm on a data source accessed via a pre-signed URL (especially those created for GET requests), the operation fails due to CORS errors. This prevents querying data stored in locations that require pre-signed URLs for access.
To Reproduce
Example SQL query:
Note: I tried this query on shell.duckdb.org, and it failed to access the data.
Additional context: The current behavior seems to be:
xhr.send(null)
, which is not caught.This behavior was observed both in a local DuckDB-Wasm implementation and on shell.duckdb.org.
Importantly, the bucket's CORS policy is set according to the documentation:
Despite this CORS policy allowing both GET and HEAD methods from any origin, the issue persists. This suggests that the problem might be related to how DuckDB-Wasm handles the pre-signed URLs rather than the bucket's CORS configuration.
A possible solution might be to skip the HEAD request for pre-signed URLs or implement exception handling to proceed with the range GET request even if the HEAD request fails.
Browser/Environment:
Chrome 128.0.6613.138
Device:
M2 Macbook Air
DuckDB-Wasm Version:
1.28.1-dev278.0
DuckDB-Wasm Deployment:
shell.duckdb.org
Full Name:
Koji Mizoguchi
Affiliation:
TechTalk Inc.