duckdb / duckdb-wasm

WebAssembly version of DuckDB
https://shell.duckdb.org
MIT License
1.03k stars 113 forks source link

Invalid Error: missing , or ] when using st_read function in Firefox #1580

Open daviewales opened 5 months ago

daviewales commented 5 months ago

What happens?

I tried to run the sample code in DuckDB WASM linked from the latest blog post.

When I clicked this link, I got the error Invalid Error: missing , or ]: https://shell.duckdb.org/#queries=v0,INSTALL-spatial~,LOAD-spatial~,CREATE-TABLE-nyc-AS-SELECT-borough%2C-st_union_agg(geom)-AS-full_geom%2C-st_area(full_geom)-AS-area%2C-st_centroid(full_geom)-AS-centroid%2C-count(*)-AS-count-FROM-st_read('https%3A%2F%2Fraw.githubusercontent.com%2Fduckdb%2Fduckdb_spatial%2Fmain%2Ftest%2Fdata%2Fnyc_taxi%2Ftaxi_zones%2Ftaxi_zones.shp')-GROUP-BY-borough~,SELECT-borough%2C-area%2C-centroid%3A%3AVARCHAR%2C-count-FROM-nyc~

But this error only occurs in Firefox. It works perfectly in Chrome. I'm using the latest Firefox 121.0.1.

To Reproduce

Click the following link in Firefox:

https://shell.duckdb.org/#queries=v0,INSTALL-spatial~,LOAD-spatial~,SELECT-borough-FROM-st_read('https%3A%2F%2Fraw.githubusercontent.com%2Fduckdb%2Fduckdb_spatial%2Fmain%2Ftest%2Fdata%2Fnyc_taxi%2Ftaxi_zones%2Ftaxi_zones.shp')-limit-1~

Browser/Environment:

Firefox 121.0.1

Device:

HP Probook Laptop

DuckDB-Wasm Version:

1.28.1-dev81.0

DuckDB-Wasm Deployment:

shell.duckdb.org

Full Name:

David Wales

Affiliation:

South Western Sydney Primary Health Network

carlopi commented 5 months ago

Here the problem is that Firefox takes a different (than Safari or Google/Edge) take on what's the Content-Length returned from a plain-text resource with gzip encoding.

Chrome: image

Firefox: image

And we currently assume this Content-Length to be the actual content length of the uncompressed original file.

I am not sure what can be done here, and need to test this in node or other relevant non-browser embeddings.

daviewales commented 5 months ago

From what I can tell, it looks like Firefox is following the spec correctly (RFC 9110). (Unless I've misread it.)

In this case, you should be able to get the uncompressed size by reading the ISIZE field from the gzip header. See RFC 1952.

Looking at the screenshots above, I can see that content-encoding=gzip in the Firefox headers. However, I can't see that in the Chrome headers. Is it just not visible in the screenshot, or has the data been sent uncompressed to Chrome? This would explain why the content-length is different between Chrome and Firefox.

daviewales commented 5 months ago

Also, strangely I can't reproduce the issue in Firefox on my Mac. I'm running Firefox 115 ESR on my Mac.

carlopi commented 5 months ago

Chrome is playing man-on-the-middle both on HEAD and GET requests, so basically to the JavaScript-side the fact that there was compression it's not really visible.

Firefox appear to be playing man-on-the-middle only on GET requests, so I still think that while this is very strange, it's Firefox behaviour here to be off. I have digged out this old bug-report: https://bugzilla.mozilla.org/show_bug.cgi?id=986924, but I am not sure.

Also note that if the file is NOT plain text, Chrome and Firefox agree on sending the original length.

I will have to dig more into this.

I also have a Mac, Firefox 121.0.1 (64-bit) gives the error.

carlopi commented 5 months ago

I reported what I think it's the Firefox bug on bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1874840

daviewales commented 5 months ago

Thanks!

daviewales commented 5 months ago

Looking at your test site, I get this in Firefox 115 ESR:


https://raw.githubusercontent.com/duckdb/duckdb_spatial/main/test/data/nyc_taxi/taxi_zones/taxi_zones.prj
347 using GET
562 using GET's arraybuffer's byteLength
347 using HEAD

https://raw.githubusercontent.com/duckdb/duckdb_spatial/main/test/data/nyc_taxi/taxi_zones/taxi_zones.shx
2204 using HEAD
2204 using GET
2204 using GET's arraybuffer's byteLength

https://raw.githubusercontent.com/duckdb/duckdb_spatial/main/test/data/nyc_taxi/taxi_zones/taxi_zones.shp
1586264 using HEAD
1586264 using GET
1586264 using GET's arraybuffer's byteLength

Is this what you get?

carlopi commented 5 months ago

According to this comment: https://bugzilla.mozilla.org/show_bug.cgi?id=1874840#c3 there might be a problem in how Firefox implement the spec, will update on this.

And while looking at this, there is potentially also a Chrome bug in the same area: https://bugs.chromium.org/p/chromium/issues/detail?id=1519411

Bottom line for the moment duckdb-wasm + Firefox will fail on reading some files, I believe this is mostly relevant in the case of Spatial, I will wait a bit to see what's the assessment from the browser, I might need to add special handling around this in our networking code.