Closed cboettig closed 9 months ago
I don't think st_read
works with /vsicurl/ either anymore, Use DuckDB's HTTPFs instead when querying remote files.
Hmm, looks like it breaks due to DuckDB trying to glob the file path, we need to remove any vsi prefixes before calling into DuckDBs glob. That said, using httpfs prevents this issue anyway.
indeed, I see!
Aside, but is there a reason for dropping the ability to use gdal's vsi? I'll try and go the httpfs route. I love duckdb's httpfs, but there's lots of nice stuff in the GDAL vsi that is highly optimized (caching, writing, etc).
@Maxxen It looks like st_read()
is quite happy with vsi support again, yay! This significantly speeds up many operations.
However, it seems not to be working with st_read_meta()
?
Here's an example read with vsicurl. Works fine, and using vsicurl
also means it leverages gdal cache, so if you run this a second time it is much faster. (which is not true without vsicurl)!
import duckdb
con = duckdb.connect()
con.install_extension("spatial")
con.load_extension("spatial")
tblname = "test"
## path with VSI
path = "/vsicurl/https://data.source.coop/cboettig/us-boundaries/mappinginequality.json"
query = f'''
CREATE OR REPLACE VIEW {tblname} AS SELECT * FROM st_read('{path}');
'''
con.sql(query)
con.table("test")
However, st_read_meta comes back blank unless we remove vsicurl prefix:
# path = "/vsicurl/https://data.source.coop/cboettig/us-boundaries/mappinginequality.json" # metadata empty
path = "https://data.source.coop/cboettig/us-boundaries/mappinginequality.json" # nonempty
query = f'''
CREATE OR REPLACE VIEW {tblname}_meta AS SELECT
layers[1].feature_count as feature_count,
layers[1].geometry_fields[1].name as geom_column_name,
layers[1].geometry_fields[1].type as geom_type,
layers[1].geometry_fields[1].crs.auth_name as name,
layers[1].geometry_fields[1].crs.auth_code as code,
layers[1].geometry_fields[1].crs.wkt as wkt,
layers[1].geometry_fields[1].crs.proj4 as proj4
FROM st_read_meta('{path}');
'''
con.sql(query)
con.table("test_meta")
It is obviously really really useful to be able to check the CRS of the data without downloading the whole data object. Of course we can just use gdal tools outside of duckdb
, but when presenting this material in the classroom it feels really awkward to drop into a completely different tool for this after having just explained how much better duckdb is for what we are doing :smile: .
Note that st_read_meta just returns empty when we try and read over the virtual filesystem interface:
If we download the file first, it works fine:
Note that gdal CLI tools have no problem doing this:
and notably,
duckdb
extension'sst_read()
is also quite happy to read over the vsi system.