duckdb / duckdb_spatial

MIT License
492 stars 41 forks source link

Dropping vsi support breaks ability to read GDB from https? #270

Closed cboettig closed 9 months ago

cboettig commented 9 months ago

In previous versions, we could read geodatabase files over vsi, such as:

st_read("/vsicurl/https://github.com/cboettig/duckdbfs/raw/spatial-read/inst/extdata/world.gpkg")

this no longer works. However, we cannot read these files with the httpfs extension either -- if we drop the /vsicurl/ prefix, the errors with

Error: Not implemented Error: HTTPFileSystem: DirectoryExists is not implemented!

on duckdb 0.9.2 using the nightly duckdb spatial extension. (testing on Ubuntu 22.04).

p.s. thanks for all the amazing work on this package, it's really exciting and could soon be game-changing for our research program!

Maxxen commented 9 months ago

Cool. This seems like an issue with HTTPFS we need to fix. But yes, since #268 im leaning towards enabling fallback to (or explicitly selecting) GDALs /vsi/ if duckdb is unable to open the file.

To eloborate on why im trying to deprecate /vsi/

We've solved one half of the problem (why can't I use httpfs in st_read?) by making GDAL aware of DuckDB, so st_read can now do all DuckDB can do, but I guess we also need to provide a way to opt-out of that when the use case is the other way around, e.g. when you want to do something only GDAL can do (like reading files inside zip archives).

Ill work on adding back support for explicit /vsi/.

Very happy to hear DuckDB is being useful for your research, I would love to hear more about how you use it and how it helps you some time!

cboettig commented 9 months ago

Thanks for these explanations, really appreciate it.

Yup, the WASM stuff makes perfect sense. (I'm amazed any of GDAL is WASM-compatible already, maybe vsi will be too one day?)

Also, I agree entirely with you about the weird syntax choices GDAL made with the vsi system, but I think that's an orthogonal issue, since it's of course possible to use GDAL VSI behind-the-scenes for remote access while not confronting the user with this syntax (I believe this is common in much of gdal-binding software, e.g. in python or qgis, where the vsi prefixes are added by the software invisibly from the user, and the standard protocol prefixes like s3:// are shown to the user instead. While that can create problems in special cases that use more complex bits of gdal syntax, it definitely avoids confusion in all the common cases).

Anyway, I think the builtin httpfs support will cover a lot of my use cases well and I'm excited to share that with my community. but I will keep an eye out for fallbacks on VSI support.

We're using duckdb a lot in our biodiversity spatial prioritization work, which we hope can inform the Global Biodiversity Framework One example we just published uses duckdb to reveal social inequities in existing biodiversity data (Chapman et al 2024). I also try and teach these tools to both Berkeley students and colleagues. This is one of our effort with NASA to make teaching materials more available, which should have some duckdb-spatial-based tutorials up soon!