Closed mdouchin closed 2 years ago
Hi @mdouchin and thanks for great & reproducible ticket! I think in principle it should work, at the end, db-sync works both ways, so if you change the PG database it propagates too back to mergin.
do you have particular use-case for this workflow, so we know if we need to prioritize this bugfix? It would be nice to have a catch-up with you, can you drop me a mail on peter.petrik@lutraconsulting.co.uk? Or do you attend FOSS4G Firenze any chance?
Hi @PeterPetrik I will attend the Foss4G in Firenze and would be glad to share ideas around geodiff
use cases.
I have 2 use cases in mind:
I will email you with more context. Let's stay connected and see us in Firenze !
I have checked the behaviour if I use only one database with a copy made in another schema, and it works well. So the issue seems to occur when using 2 separated PostgreSQL databases
# Reinit the database geodiff_source (see first message above)
#
# Copy content from the "eee" schema to a new schema "eee_copy"
geodiff copy --driver 'postgres' "host=localhost port=5432 user=mdouchin password=**** dbname=geodiff_source" eee eee_copy
# Insert new line
psql -d geodiff_source -c "INSERT INTO eee.points (p_name, geom) VALUES ('third', ST_MakePoint(3, 5))"
# Get the diff
geodiff diff --json --driver 'postgres' "host=localhost port=5432 user=mdouchin password=**** dbname=geodiff_source" eee_copy eee
# returns correct JSON
returns
{
"geodiff": [
{
"changes": [
{
"column": 0,
"new": 3
},
{
"column": 1,
"new": "third"
},
{
"column": 2,
"new": "R1AAAWoIAAABAQAAAAAAAAAAAAhAAAAAAAAAFEA="
}
],
"table": "points",
"type": "insert"
}
]
}
hi @mdouchin from a quick look into the code it seems there is a logic mistake in GEODIFF_createChangesetDr() that allows to short circuit possibly costly comparison:
if ( strcmp( driverSrcName, driverDstName ) == 0 )
{
return GEODIFF_createChangesetEx( contextHandle, driverSrcName, driverSrcExtraInfo, src, dst, changeset );
}
However in this comparison we should be also checking that driverSrcExtraInfo
and driverDstExtraInfo
are the same, to fix this bug.
Comparison of data between two different postgres databases is not particularly efficient - geodiff will export data of schemas to temporary sqlite files and then compare them.
For your use case to synchronize two databases, even if cross-database diff was efficient, you still need some common ancestor - if both databases change, to know what changes to move across (assuming both databases can change - if not then things are simpler). The best approach using geodiff would be to have two schemas in each of the databases:
Then whenever you sync, you would create diff between "base" and "normal" schemas in each database, apply the diff on the other database, and finally update the "base" schemas, something like this:
If only one database can change, things are easier, and only one "base" schema is needed during sync:
Hi @PeterPetrik @wonder-sk Thank you for your detailed explanation, completed by our discussion during this Foss4G2022 ! I will do some more tests, and report back if needed. Feel free to close this issue if necessary
Thanks, it is still a bug so lets keep this one open
@wonder-sk I need some more help here:
Then whenever you sync, you would create diff between "base" and "normal" schemas in each database, apply the diff on the other database, and finally update the "base" schemas, something like this:
1. create diff: A_normal - A_base = A_diff 2. create diff: B_normal - B_base = B_diff 3. if both diffs contain changes: rebase A_diff on top of B_diff (or vice versa) 4. apply diffs to A_normal and B_normal and A_base and B_base
In the step 3/
if I use the rebase-diff
method, the diff are well rebased, but if there is a conflict, it is not "removed" from the rebased diff. Is it done in purpose ? Which means geodiff client should rebase, then check the conflict file, and remove the conflicts listed in the conflict file from the rebased diff ?
Or should I prefer to just use the geodiff rebase-db
command which will only aplly the not conflicting changes ?
Do you prefer me to create a new issue for my questions regarding PostgreSQL database synchronisation (comment above ) for more clarity ?
Do you prefer me to create a new issue for my questions regarding PostgreSQL database synchronisation (comment above ) for more clarity ?
It's okay to keep the discussion here, sorry I just forgot to answer the questions above...
n the step 3/ if I use the rebase-diff method, the diff are well rebased, but if there is a conflict, it is not "removed" from the rebased diff. Is it done in purpose ? Which means geodiff client should rebase, then check the conflict file, and remove the conflicts listed in the conflict file from the rebased diff ?
That's intentional... The idea is that whenever there is an edit conflict while rebasing, we may not be able in a good position to resolve the conflict with a proper 3-way merge GUI (because rebase may be done on a phone with a user with limited knowledge, or it is done in an automated system like db-sync which we don't want to pause because of a minor conflict). So we simply accept the latest change as the winner of the edit conflict ("our" version wins), and this automatic conflict resolution info is stored to a new JSON file, which gets uploaded to the project directory, so the project admin can inspect the automatic conflict resolution and possibly override the values... Does that make sense?
Thanks a lot for your answer @wonder-sk
This makes sense ! Since this kind of conflict should happen rarely, it is indeed a robust and fast resolution method. I will propose a PR with some explanation whenever I find enough time to correctly write it down.
Do you think geodiff could provide a "conflict" merging system, for example to merge all the conflicts which have happened between 2 dates ? For example, if you have synchronized 10 times between the 2 databases, and once per week/month, you need to have a look at the conflicts, but in a simple way. It would be handy to read one single conflict file instead of possibly 3 or 4. I could propose a PR for this "conflict merging tool", even if my C++ coding skills are a bit rusty... I could also propose a Python QGIS processing alg which will do the merge and create a vectory layer to visualize the conflicts ?
Do you think geodiff could provide a "conflict" merging system, for example to merge all the conflicts which have happened between 2 dates ? For example, if you have synchronized 10 times between the 2 databases, and once per week/month, you need to have a look at the conflicts, but in a simple way. It would be handy to read one single conflict file instead of possibly 3 or 4. I could propose a PR for this "conflict merging tool", even if my C++ coding skills are a bit rusty... I could also propose a Python QGIS processing alg which will do the merge and create a vectory layer to visualize the conflicts ?
With this conflict merging system, do you mean a GUI which would show all the recorded conflicts (on a map or a table or both) and it would allow inspection of each conflict, with some options to change the conflict resolution? There are certainly more ways how to do that - probably the easiest would be a QGIS processing alg as you mentioned, or it could be a complete GUI within QGIS (the Mergin plugin already comes with a GUI to compare changes - see https://github.com/MerginMaps/qgis-mergin-plugin/pull/397 - only missing the bits related to conflict resolution).
Thanks !
Hi,
I am using the Dockerfile to build and use
geodiff
with the PostgreSQL driver enabled, with the following steps:NB: I created a bash alias to run the docker command with the simple
geodiff
Test data:
I successfully ran PostgreSQL related commands, for example:
Then I inserted one more feature in the unique table
eee.points
in thegeodiff
database, and try to create a diff between the two databases.The returned JSON is "empty".
I also tried with the same source database, but with a SQLite database as the target database, and it worked: the diff corresponded to the expected differences between the 2 databases:
which returns
I was wondering if it was possible to use
geodiff
for a PostgreSQL/PostgreSQL diff tool, or is it more "field oriented" and designed for GeoPackage as the pivot format ?Thanks a lot for your great work on this tool !