duckdb / duckdb_spatial

MIT License
489 stars 40 forks source link

Copy from attached Postgres/Postgis DB #317

Open barnabasharris opened 6 months ago

barnabasharris commented 6 months ago

Hi, If I have a Postgres/PostGIS DB containing tables with geometry columns of type public.geometry(multilinestring, 27700) I can attach it to my DuckDB database (with spatial extension already installed and loaded) with:

D ATTACH 'dbname=bharris user=postgres host=127.0.0.1' AS db (TYPE POSTGRES, READ_ONLY);

I notice that the geometry columns are represented as type VARCHAR:

D show all tables;
┌──────────┬─────────┬──────────────────────┬──────────────────────┬───────────────────────────────────────────────┬───────────┐
│ database │ schema  │         name         │     column_names     │                 column_types                  │ temporary │
│ varchar  │ varchar │       varchar        │      varchar[]       │                   varchar[]                   │  boolean  │
├──────────┼─────────┼──────────────────────┼──────────────────────┼───────────────────────────────────────────────┼───────────┤
│ db       │ public  │ geography_columns    │ [f_table_catalog, …  │ [VARCHAR, VARCHAR, VARCHAR, VARCHAR, INTEGE…  │ false     │
│ db       │ public  │ geometry_columns     │ [f_table_catalog, …  │ [VARCHAR, VARCHAR, VARCHAR, VARCHAR, INTEGE…  │ false     │
│ db       │ public  │ spatial_ref_sys      │ [srid, auth_name, …  │ [INTEGER, VARCHAR, INTEGER, VARCHAR, VARCHAR] │ false     │
│ db       │ rs      │ chron_key_dating     │ [d_type, descripti…  │ [VARCHAR, VARCHAR, TIMESTAMP]                 │ false     │
│ db       │ rs      │ chron_reliability    │ [r_rank, descripti…  │ [INTEGER, VARCHAR, TIMESTAMP]                 │ false     │
│ db       │ rs      │ dating_context_types │ [context_type, des…  │ [VARCHAR, VARCHAR, INTEGER]                   │ false     │
│ db       │ rs      │ earthwork            │ [uid, e_name, coun…  │ [INTEGER, VARCHAR, VARCHAR, VARCHAR, VARCHA…  │ false     │
│ db       │ rs      │ earthwork_geometry   │ [id, e_uid, data_s…  │ [INTEGER, INTEGER, VARCHAR, VARCHAR, VARCHA…  │ false     │

And I can't do any spatial operations on them.

What's the best method for getting DuckDB to recognise these columns as geometry columns?

barnabasharris commented 6 months ago

Just read some earlier issues around this and discovered the workaround of dumping the geometry column to a new column (named e.g. geometry_wkb) of type bytea using the PostGIS function ST_AsEWGB(geometry) prior to attachment. Then DuckDB can make the a new geometry column using ST_GeomFromWKB(geometry_wkb).

Unsure of the technical implications but from a user perspective it would be great if this process could occur automatically either on attachment or copy of geometry column from Postgres to DuckDB.

Riezebos commented 4 months ago

I ran into the same problem and found another workaround: The VARCHAR column that DuckDB receives can be converted directly to a geometry by calling ST_GeomFromHEXWKB on it.

This way Postgis tables can be used by directly querying them e.g.:

SELECT ST_GeomFromHEXWKB(lonlat) from pg_db.public.my_table