MerginMaps / geodiff

Library for handling diffs for geospatial data
https://merginmaps.com
MIT License
146 stars 16 forks source link

make_copy fails with legacy tables created with AddGeometryColumn #187

Open fpuga opened 2 years ago

fpuga commented 2 years ago

I have a legacy PostgreSQL now in version 9.6 (PostGIS 2.5).

This database have been upgraded from previous versions and the old tables where created using AddGeometryColumn, so a \dlooks like this, with the check constraints present.

         Column          │         Type         │ Collation │ Nullable │                      Default                       
═════════════════════════╪══════════════════════╪═══════════╪══════════╪════════════════════════════════════════════════════
gid                      │ integer              │           │ not null │ nextval('myschema.mytable_gid_seq'::regclass)
the_geom                 │ geometry             │           │          │ 
Check constraints:
    "enforce_dims_the_geom" CHECK (st_ndims(the_geom) = 2)
    "enforce_geotype_the_geom" CHECK (geometrytype(the_geom) = 'MULTIPOLYGON'::text OR the_geom IS NULL)
    "enforce_srid_the_geom" CHECK (st_srid(the_geom) = 25829)

In the copy process of mergin-db-sync an error is raised when processing these tables:

== starting mergin-db-sync daemon == version 1.1.1 ==
Logging in to Mergin...
Processing Mergin Maps project 'admin/myproject'
Connecting to the database...
Downloading latest Mergin Maps project admin/inventario to /tmp/dbsync/myproject
The base schema and the output GPKG do not exist yet, going to initialize them ...
Error: copy failed!
GEODIFF: NOTICE:  schema "mergin_myschema" does not exist, skipping
Error: Unknown geometry type: GEOMETRY

Traceback (most recent call last):
  File "dbsync_daemon.py", line 63, in <module>
    main()
  File "dbsync_daemon.py", line 35, in main
    dbsync.dbsync_init(mc, from_gpkg=False)
  File "/mergin-db-sync/dbsync.py", line 647, in dbsync_init
    init(conn, mc, from_gpkg)
  File "/mergin-db-sync/dbsync.py", line 615, in init
    _geodiff_make_copy(conn_cfg.driver, conn_cfg.conn_info, conn_cfg.modified,
  File "/mergin-db-sync/dbsync.py", line 131, in _geodiff_make_copy
    _run_geodiff([config.geodiff_exe, "copy", "--driver-1", src_driver, src_conn_info, "--driver-2", dst_driver, dst_conn_info, "--skip-tables", _tables_list_to_string(ignored_tables), src, dst])
  File "/mergin-db-sync/dbsync.py", line 75, in _run_geodiff
    raise DbSyncError("geodiff failed!\n" + str(cmd))
dbsync.DbSyncError: geodiff failed!

With new tables that does not have the check constraints it works correctly:

Column │         Type         │ Collation │ Nullable │                       Default                       
════════╪══════════════════════╪═══════════╪══════════╪═════════════════════════════════════════════════════
 id     │ integer              │           │ not null │ nextval('myschema.testtable_id_seq'::regclass)
 geom   │ geometry(Point,4326) │           │          │ 

Not sure if this is a "bug" expected to be solved. Close it if you think that it is out of scope but it will be awesome if you have any advice or maybe a warning in the documentation can be added.

wonder-sk commented 2 years ago

Thanks for reporting the issue!

It does not seem like the constraints are the source of the issue (they are ignored by geodiff). The actual issue will have something to do with the content of geometry_columns table. Geodiff runs the following query to get more information about geometry type:

SELECT f_geometry_column, type, srid, coord_dimension FROM geometry_columns
  WHERE f_table_schema = '<your_schema>'  AND f_table_name = '<your_table>';

Could you please check what is the result of this query for your "old" table and what does it say for a "new" table?

fpuga commented 2 years ago

I check that in my tests and see no real difference.


          f_table_name           │ f_geometry_column │      type       │ srid  │ coord_dimension 
═════════════════════════════════╪═══════════════════╪═════════════════╪═══════╪═════════════════
 old_table                       │ the_geom          │ MULTIPOLYGON    │ 25829 │               2
 new_table                       │ geom              │ POINT           │  4326 │               2

Note that i'm using a different srid in the new_table but just because i make a quick test.

I have more old tables in the schema, some of then of type POINT and LINESTRING. I can make more tests, trying to narrow the problem if you think it is useful.

fpuga commented 2 years ago

I've created a reproducible example. Just note that my testing environment is PostgreSQL v9.6 and PostGIS v2.5.

Only one schema, with two tables. old_table has been created in the past with AddGeometryColumn, and new_table has been created now, for testing.

When trying to sync the schema with both tables an error is raised: ```bash $ docker run --name mergin_db_sync -it -e MERGIN__URL=http://172.17.0.1:5000/ -e MERGIN__USERNAME=admin -e MERGIN__PASSWORD=topsecret -e CONNECTIONS="[{driver='postgres', conn_info='host=172.17.0.1 dbname=test user=postgres password=postgres', modified='myschema', base='mergin_myschema', mergin_project='admin/mergin_project_test', sync_file='sync_db.gpkg'}]" lutraconsulting/mergin-db-sync:latest python3 dbsync_daemon.py --init-from-db == starting mergin-db-sync daemon == version 1.1.1 == Logging in to Mergin... Processing Mergin Maps project 'admin/mergin_project_test' Connecting to the database... Downloading latest Mergin Maps project admin/mergin_project_test to /tmp/dbsync/mergin_project_test The base schema and the output GPKG do not exist yet, going to initialize them ... GEODIFF: NOTICE: schema "mergin_myschema" does not exist, skipping Error: diff failed! GEODIFF: Error: Unknown geometry type: GEOMETRY Error: Failed to create a copy of modified source for driver postgres Traceback (most recent call last): File "dbsync_daemon.py", line 63, in main() File "dbsync_daemon.py", line 35, in main dbsync.dbsync_init(mc, from_gpkg=False) File "/mergin-db-sync/dbsync.py", line 647, in dbsync_init init(conn, mc, from_gpkg) File "/mergin-db-sync/dbsync.py", line 625, in init changes_gpkg_base = _compare_datasets("sqlite", "", gpkg_full_path, conn_cfg.driver, File "/mergin-db-sync/dbsync.py", line 148, in _compare_datasets _geodiff_create_changeset_dr(src_driver, src_conn_info, src, dst_driver, dst_conn_info, dst, tmp_changeset, ignored_tables) File "/mergin-db-sync/dbsync.py", line 140, in _geodiff_create_changeset_dr _run_geodiff([config.geodiff_exe, "diff", "--driver-1", src_driver, src_conn_info, "--driver-2", dst_driver, dst_conn_info, src, dst, changeset]) File "/mergin-db-sync/dbsync.py", line 75, in _run_geodiff raise DbSyncError("geodiff failed!\n" + str(cmd)) dbsync.DbSyncError: geodiff failed! ['/geodiff/build/geodiff', 'diff', '--driver-1', 'sqlite', '', '--driver-2', 'postgres', 'host=172.17.0.1 dbname=test user=postgres password=postgres', '/tmp/dbsync/mergin_project_test/sync_db.gpkg', 'mergin_myschema', '/tmp/ycykvzUI'] ```
When old_table is skipped everything works: ```bash $ docker run --name mergin_db_sync -it -e MERGIN__URL=http://172.17.0.1:5000/ -e MERGIN__USERNAME=admin -e MERGIN__PASSWORD=topsecret -e CONNECTIONS="[{driver='postgres', conn_info='host=172.17.0.1 dbname=test user=postgres password=postgres', modified='myschema', base='mergin_myschema', mergin_project='admin/mergin_project_test', sync_file='sync_db.gpkg', skip_tables=['old_table']}]" lutraconsulting/mergin-db-sync:latest python3 dbsync_daemon.py --init-from-db == starting mergin-db-sync daemon == version 1.1.1 == Logging in to Mergin... Processing Mergin Maps project 'admin/mergin_project_test' Connecting to the database... Downloading latest Mergin Maps project admin/mergin_project_test to /tmp/dbsync/mergin_project_test The base schema and the output GPKG do not exist yet, going to initialize them ... GEODIFF: NOTICE: schema "mergin_myschema" does not exist, skipping Init done! 2022-08-17 18:39:23.055443 Trying to pull Processing Mergin Maps project 'admin/mergin_project_test' No changes on Mergin Maps. Pull done! Trying to push Processing Mergin Maps project 'admin/mergin_project_test' No changes in the database. Push done! Going to sleep ```
And this the database schema that reproduces the bug ```bash psql -h localhost -p 5432 -U postgres -c "CREATE DATABASE test" psql -h localhost -p 5432 -U postgres -d test -f test_schema.sql ``` ```sql -- test_schema.sql -- Dumped from database version 9.6.19 -- Dumped by pg_dump version 9.6.19 SET statement_timeout = 0; SET lock_timeout = 0; SET idle_in_transaction_session_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SELECT pg_catalog.set_config('search_path', '', false); SET check_function_bodies = false; SET xmloption = content; SET client_min_messages = warning; SET row_security = off; CREATE SCHEMA myschema; CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog; COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language'; CREATE EXTENSION IF NOT EXISTS postgis WITH SCHEMA public; COMMENT ON EXTENSION postgis IS 'PostGIS geometry, geography, and raster spatial types and functions'; CREATE OPERATOR FAMILY public.gist_geometry_ops USING gist; CREATE TABLE myschema.new_table ( id integer NOT NULL, comments text, the_geom public.geometry(MultiPolygon,25829) ); CREATE SEQUENCE myschema.new_table_id_seq START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE CACHE 1; ALTER SEQUENCE myschema.new_table_id_seq OWNED BY myschema.new_table.id; CREATE TABLE myschema.old_table ( gid integer NOT NULL, nome character varying(37), the_geom public.geometry, CONSTRAINT enforce_dims_the_geom CHECK ((public.st_ndims(the_geom) = 2)), CONSTRAINT enforce_geotype_the_geom CHECK (((public.geometrytype(the_geom) = 'MULTIPOLYGON'::text) OR (the_geom IS NULL))), CONSTRAINT enforce_srid_the_geom CHECK ((public.st_srid(the_geom) = 25829)) ); CREATE SEQUENCE myschema.old_table_gid_seq START WITH 1 INCREMENT BY 2 NO MINVALUE NO MAXVALUE CACHE 1; ALTER SEQUENCE myschema.old_table_gid_seq OWNED BY myschema.old_table.gid; ALTER TABLE ONLY myschema.new_table ALTER COLUMN id SET DEFAULT nextval('myschema.new_table_id_seq'::regclass); ALTER TABLE ONLY myschema.old_table ALTER COLUMN gid SET DEFAULT nextval('myschema.old_table_gid_seq'::regclass); ALTER TABLE ONLY myschema.new_table ADD CONSTRAINT new_table_pkey PRIMARY KEY (id); ALTER TABLE ONLY myschema.old_table ADD CONSTRAINT old_table_pkey PRIMARY KEY (gid); ```
alexbruy commented 2 years ago

When we try to get table schema for old_table, for geometry column our query returns just geometry as a datatype which is not enough to determine correct geometry type and CRS, while for new_table we get proper datatype geometry(MultiPolygon, 25829).

As far as I can see this is related to changes in the system catalog between version 9 and supported recent releases.