duckdb / duckdb_spatial

MIT License
492 stars 41 forks source link

Segment Fault with Overture's Land Use WKB data #319

Closed marklit closed 6 months ago

marklit commented 6 months ago

The WKB data in Overture's new Land Use dataset they released today causes a segment fault. This is with DuckDB v0.10.1 4a89d97db8.

$ aws s3 --no-sign-request cp s3://overturemaps-us-west-2/release/2024-05-16-beta.0/theme=base/type=land_use/part-00065* ./
SELECT geometry FROM  READ_PARQUET('part-0006[57]*.parquet') LIMIT 1;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                             geometry                                                              │
│                                                               blob                                                                │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ \x00\x00\x00\x00\x03\x00\x00\x00\x01\x00\x00\x00\x0C@\x27E\xD4\x09H|f@N)\xA9\xE7O\x93\x0A@\x27E\xEF%\xC5\xEB&@N)\xB2\xC4\x08K8@…  │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘#
SELECT ST_GEOMFROMWKB(geometry) FROM  READ_PARQUET('part-00065*.parquet') LIMIT 1;
Segmentation fault
Maxxen commented 6 months ago

I don't know why overture decided to big-endian encode their WKB but Im pretty sure this was fixed #296. You should be able to force-reinstall spatial to get the latest patched version for v0.10.1 FORCE INSTALL spatial, alternatively update to v0.10.2.

marklit commented 6 months ago

That works, thank you.

jwass commented 6 months ago

@Maxxen @marklit I took a bit of a look into it. We are just using Sedona's geoparquet writer and I think it by default writes big endian. I pulled up some data from our January release and confirmed it's been big endian all along. Not sure how big of a deal that is but we can see about configuring Sedona to write out little endian if possible. @jiayuasu