duckdb / duckdb_spatial

MIT License
455 stars 33 forks source link

Not implemented Error: WKB Reader: Geometry type 11 not supported #365

Closed marklit closed 1 month ago

marklit commented 1 month ago

I'm using v0.10.2 1601d94f94.

select * from st_read('GHUB2018.gpkg') LIMIT 1;
Not implemented Error: WKB Reader: Geometry type 11 not supported

It's the GPKG inside the of the RAR download from here: https://zenodo.org/records/5168383

Any way to ignore the geometry at least if it can't be implemented?

marklit commented 1 month ago

As a workaround, I tried treating the GPKG file as a SQLite3 database but DuckDB isn't happy with the way the geom field is structured. QGIS can open this file without issue through.

ATTACH 'GHUB2018.gpkg' AS ghub;
USE ghub;
.schema --indent
CREATE TABLE UE_2018(
  fid BIGINT PRIMARY KEY,
  geom VARCHAR,
  UE_Area DOUBLE,
  CONTINENT VARCHAR,
  Country VARCHAR
);;
SELECT * FROM UE_2018 LIMIT 1;3
Mismatch Type Error: Invalid type in column "geom": column was declared as text, found "GP" of type "blob" instead.
SELECT * EXCLUDE(geom) FROM UE_2018 LIMIT 1;
┌───────┬────────────────────┬───────────┬─────────────┐
│  fid  │      UE_Area       │ CONTINENT │   Country   │
│ int64 │       double       │  varchar  │   varchar   │
├───────┼────────────────────┼───────────┼─────────────┤
│     1 │ 485.43857826704914 │ Oceania   │ New Zealand │
└───────┴────────────────────┴───────────┴─────────────┘
SELECT * EXCLUDE(geom), geom::BLOB FROM UE_2018 LIMIT 1;
Mismatch Type Error: Invalid type in column "geom": column was declared as text, found "GP" of type "blob" instead.
SELECT * EXCLUDE(geom), geom::TEXT FROM UE_2018 LIMIT 1;
Mismatch Type Error: Invalid type in column "geom": column was declared as text, found "GP" of type "blob" instead.
Konkrad commented 1 month ago

I have a similar problem:

WKB type '942747698' is not a supported geometry type

(and many other types)

when trying to converting a csv to geoparquet with the nightly build of duckdb.

What makes me a bit suspicious is that I create geometries like this:

copy (select *, ST_Point(LON, LAT) as geometry from read_csv('input.csv',PARALLEL=TRUE)) 

It seems not to be related to the geometry column as I don't have any error when I exclude all the others. The more columns I excluded from the CSV the likelier was it that I ran without a problem.

Maxxen commented 1 month ago

@marklit

It seems like the file contains curve geometries, which DuckDB does not support. You can work around this by having st_read output the raw WKB (without converting it to GEOMETRY) by passing the optional keep_wkb = TRUE argument to it.

You should then be able to convert the WKB column into GEOMETRY with a TRY_CAST to have the unsupported geometries (all failed casts) turn to NULL.

Maxxen commented 1 month ago

@Konkrad

It seems like you have an unrelated issue but I don't really understand what you mean. Could you share exactly what you are trying to do, the full SQL and ideally a dataset that reproduces the issue?

marklit commented 1 month ago

I'll leave this here to help anyone else who has unsupported geometry. This will allow you to exclude the records that aren't supported and work with the ones that are.

select substr(geom::BLOB::TEXT, 0, 16) as geom_type_str,
       count(*)
from st_read('GHUB2018.gpkg', keep_wkb=TRUE)
group by 1;
┌─────────────────┬──────────────┐
│  geom_type_str  │ count_star() │
│     varchar     │    int64     │
├─────────────────┼──────────────┤
│ \x01\x0C\x00\x0 │           72 │
│ \x01\x06\x00\x0 │        10170 │
└─────────────────┴──────────────┘

The 10K \x01\x06\x00\x0 records are MULTIPOLYGONs and are fine. Below I've excluded the \x01\x0C\x00\x0 is the curve geometry in the WHERE clause which stops these records from being interpreted by the Spatial Extension.

select ST_GeometryType(geom::GEOMETRY) as geom_type,
       count(*)
from st_read('GHUB2018.gpkg', keep_wkb=TRUE) 
WHERE substr(geom::BLOB::TEXT, 0, 16) != '\x01\x0C\x00\x0' 
group by 1;
┌───────────────┬──────────────┐
│   geom_type   │ count_star() │
│ geometry_type │    int64     │
├───────────────┼──────────────┤
│ MULTIPOLYGON  │        10170 │
└───────────────┴──────────────┘

It's likely the source material I was working with was produced with PostGIS which supports EWKB.

https://github.com/postgis/postgis/blob/80806376a566ebd1b2edfe0dd0a69dc1c3f718cd/deps/flatgeobuf/header_generated.h#L43

enum class GeometryType : uint8_t {
  Unknown = 0,
  Point = 1,
  LineString = 2,
  Polygon = 3,
  MultiPoint = 4,
  MultiLineString = 5,
  MultiPolygon = 6,
  GeometryCollection = 7,
  CircularString = 8,
  CompoundCurve = 9,
  CurvePolygon = 10,
  MultiCurve = 11,
  MultiSurface = 12,
  Curve = 13,
  Surface = 14,
  PolyhedralSurface = 15,
  TIN = 16,
  Triangle = 17,
  MIN = Unknown,
  MAX = Triangle
};

So from CircularString onward isn't supported in this extension by the looks of things.

From what I gather on https://libgeos.org/specifications/wkb/ a WHERE clause that only allows the second byte to be between 0x01 and 0x07 could be a more universal filter for unsupported geometry.

GEOS only supports the seven original simple features geometry types.

enum wkbGeometryType {
    wkbPoint = 1,
    wkbLineString = 2,
    wkbPolygon = 3,
    wkbMultiPoint = 4,
    wkbMultiLineString = 5,
    wkbMultiPolygon = 6,
    wkbGeometryCollection = 7
};

This should filter out any EWKB records and allow you to work with the geometry that is supported in this extension.

select ST_GeometryType(geom::GEOMETRY) as geom_type,
       count(*)
from st_read('GHUB2018.gpkg', keep_wkb=TRUE)
WHERE ('0x' || substr(geom::BLOB::TEXT, 7, 2))::int < 8
group by 1;
┌───────────────┬──────────────┐
│   geom_type   │ count_star() │
│ geometry_type │    int64     │
├───────────────┼──────────────┤
│ MULTIPOLYGON  │        10170 │
└───────────────┴──────────────┘