duckdb / duckdb_spatial

MIT License
493 stars 41 forks source link

Fix st_read GDAL virtual file systems IO Error #268

Closed mick closed 9 months ago

mick commented 9 months ago

Fixes https://github.com/duckdb/duckdb_spatial/issues/255

disclaimer: I dont write c++ and dont know much of the internals of either of these projects, so please let me know if there a better / preferred approach.

❯ ./build/debug/duckdb
v0.10.0 20b1486d11
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D select * from st_read('/vsigzip/la-census-tracts.geojson.gz');
┌──────────┬─────────┬─────────┬────────────────────┬────────────────────┬─────────────────────────────────────────────────────────────────────────────────────┐
│ OBJECTID │  CT20   │  LABEL  │   Shape.STArea()   │  Shape.STLength()  │                                        geom                                         │
│  int32   │ varchar │ varchar │       double       │       double       │                                      geometry                                       │
├──────────┼─────────┼─────────┼────────────────────┼────────────────────┼─────────────────────────────────────────────────────────────────────────────────────┤
│     4752 │ 900104  │ 9001.04 │  458512830.4189453 │  86062.95741312855 │ POLYGON ((-117.8093662 34.6166349, -117.8093572 34.6161489, -117.8093362 34.61505…  │

...etc..

Added a very basic test for use of VSI. I was gonna include a test writing a file and use that newly written gzipped file, rather than including the one in this PR, but COPY TO using /vsigzip/ prefix doesnt work yet: Error: Not implemented Error: GDAL Error (6): Seeking on writable compressed data streams not supported. Maybe that can be addressed in the future. I assume it more involved change.

Maxxen commented 9 months ago

Hi! Great work! If the CI passes im inclined to merge this if it solves your problem! I still plan to look into this deeper but after a quick glance I don't think this actually solves the underlying issue of not being able to chain GDAL's auxiliary virtual filesystems within DuckDBs filesystem plugin, and so Im not sure this will work on e.g. WASM or in combination with DuckDBs httpfs. But as I said if it solves your use-case with local gzipped files that's good enough reason to merge for now anyway.

mick commented 9 months ago

@Maxxen yeah it does fix my usage, but yeah it not a holistic fix.

As you mentioned, maybe adding the duckdb vsi prefix should happen in a different order. In most cases probably the user supplied VSIs should come before the duckdb vsi prefix. maybe for all them except the network based VSIs?