Open rpbouman opened 1 week ago
I should have included: I can open the file fine using duckdb cli:
DuckDb> SELECT * FROM st_read( 'C:\roland\projects\duckdb-wasm-snippets\3. reading xlsx\test.xlsx' );
┌────────────┬──────────┐
│ Fruit │ Calories │
│ varchar │ int32 │
├────────────┼──────────┤
│ Apple │ 52 │
│ Banana │ 89 │
│ Cantaloupe │ 39 │
└────────────┴──────────┘
Run Time (s): real 0.045 user 0.000000 sys 0.015625
changes: 0 total_changes: 0
DuckDb>
Question, can the file be opened when queried from an http endpoint?
Something like:
LOAD spatial;
SELECT * FROM st_read( 'http://127.0.0.1:8081/<filename>' )
while having a local http-server?
Question is basically whether this problem is with reading the format at all in Wasm (possibly due to different config in DuckDB-Wasm spatial) or this is a filesystem mismatch within duckdb-wasm. Thanks
Actually this seems to work, at least on the shell, from a remote endpoint: https://shell.duckdb.org/#queries=v0,LOAD-spatial~,SUMMARIZE-FROM-st_read('https%3A%2F%2Fraw.githubusercontent.com%2Fduckdb%2Fduckdb%20rs%2Fmain%2Fcrates%2Fduckdb%2Fexamples%2FMovies_Social_metadata.xlsx')~ (it's another file, but should likely be independent)
Hi @carlopi - I only just noticed you had two replies.
I can confirm that I can read it with DuckDB WASM when I call st_read explicitly and pass in the http url.
Interestingly, the example with the http url currently fails in the CLI (no WASM) with DuckDB 1.1.0 GA and the nightly build v1.1.2-dev44:
I think in the CLI explicitly loading httpfs is needed:
LOAD https;
LOAD spatial;
SUMMARIZE FROM st_read('https://raw.githubusercontent.com/duckdb/duckdb-rs/main/crates/duckdb/examples/Movies_Social_metadata.xlsx');
then works on my end.
And @rpbouman, on your previous comment: I was trying to test (with very low effort) whether this was supported at all by spatial in wasm / find where things go wrong, and it appears it's connected to the File-based filesystem. I will need to dig there, log system calls, and find where stuff start to differ (and why / how to fix it).
I think in the CLI explicitly loading httpfs is needed: ... then works on my end.
Ah! Ok, I thought that was enabled by default. After installing it explicitly, it works for me too. THanks for clearing that up!
What happens?
SELECT * FROM st_read( '<filename>' )
fails withError: IO Error: GDAL Error (4):
test.xlsx' not recognized as a supported file format.`To Reproduce
I'm running this page locally:
Sample xlsx file is attached. test.xlsx
Browser/Environment:
Chrome 129.0.6668.71, MS Edge 129.0.2792.65
Device:
Windows 11 Desktop
DuckDB-Wasm Version:
using https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm@1.28.1-dev287.0/+esm
DuckDB-Wasm Deployment:
https://cdn.jsdelivr.net/npm/@duckdb/duckdb-wasm@1.28.1-dev287.0/+esm
Full Name:
Roland Bouman
Affiliation:
EPAM Systems BV Netherlands