duckdb / duckdb_spatial

MIT License
426 stars 32 forks source link

Understanding duckdb WKB geometry type (not readable by shapely) #188

Open ncclementi opened 8 months ago

ncclementi commented 8 months ago

On Ibis we are working on supporting the duckdb geospatial spatial extension (see https://github.com/ibis-project/ibis/pull/7454) and we would like to understand better what's coming out of geometry types when we select from a column.

The docs say it's WKB but when we try to convert to a shapely object, we get an error. But when we call st_aswkb then we can get it to work, but it doesn’t seem like this should be necessary.

Here is a reproducer that @cpcloud put together that shows this: https://gist.github.com/cpcloud/82d98fb1bd1a4919c6a4c41643ad3141

Maxxen commented 8 months ago

Hi!

DuckDB's internal representation of the geometry type is not actually WKB, in fact it is very similar to PostGIS geometry representation, although with a different "header". However I can't promise that our geometry representation is fully stable yet. I still want to extend it (soon!) so that we can store for example multi-dimensional geometries (Z/M), SRID and/or other properties like whether a geometry is valid/solid/geodetic. In theory we should be able to do this in a backwards compatible way but again, no promises.

Here's some excerpts from the code that hopefully provides some insight into how the format works.

The header: (4 bytes) https://github.com/duckdb/duckdb_spatial/blob/b4cb5bd214c53f65a0d9f585fb9490bd50cbb4bf/spatial/include/spatial/core/geometry/geometry.hpp#L431-L435

The "properties" (1 byte) https://github.com/duckdb/duckdb_spatial/blob/b4cb5bd214c53f65a0d9f585fb9490bd50cbb4bf/spatial/include/spatial/core/geometry/geometry_properties.hpp#L8-L16

The full geometry. (always a multiple of 8 bytes) The comments here are slightly outdated, if the geometry is non-empty and not a point there is a 8xF32 bounding box serialized after the "header" and the 4 byte padding.

https://github.com/duckdb/duckdb_spatial/blob/b4cb5bd214c53f65a0d9f585fb9490bd50cbb4bf/spatial/src/spatial/core/geometry/geometry_factory.cpp#L134-L161

Full Deserialization code

https://github.com/duckdb/duckdb_spatial/blob/b4cb5bd214c53f65a0d9f585fb9490bd50cbb4bf/spatial/src/spatial/core/geometry/geometry_factory.cpp#L523-L691