DavidMStraub / gramps-web-sync

Development version of a Gramps addon to synchronize Gramps Web with Gramps Desktop
GNU General Public License v3.0
12 stars 2 forks source link

Unexpected Simultaneous Changes upon first sync #2

Open brucejackson opened 2 years ago

brucejackson commented 2 years ago

After getting webapisync to work on the Gramps Example database, I attempted to make it work on my own Family Tree.

I started with a backup! Then I followed the same process I used to test the Gramps Example database.

1) I created a new Family Tree for docker called docker-cchj. My Family Tree, in Gramps Deskop, is CCHJ.
2) I shutdown Gramps Desktop and stopped Docker. 2) I copied meta_data.db and sqlite.db from the directory containing CCHJ to the docker-cchj directory.
3) I opened Gramps directly to CCHJ.
4) I edited my docker config to use docker-cchj and I restarted Docker.
5) I ran webAPISync. I expected a message indicating the databases are identical, as the plugin did when I tested using the Gramps Example database. Instead, I go the message below indicating 33 Simultaneous Modifications. I did not edit any records during this period.

image

My first instinct is to run a repair on my database and try again. Do you have any other insights?

DavidMStraub commented 2 years ago

Thanks, that's very useful! What's happening is that the Gramps XML exported by Gramps inside docker and imported by your local Gramps creates slightly different objects from your original tree. This happened to me initially due to locale issues (like this one).

To find out what's causing it here, can you please

  1. Export an XML via the API (/api/exporters/gramps/file)
  2. Export an XML in your local Gramps
  3. Do a diff

It could be that they are the same and the issue happens later, but if they are different we already have a clue.

brucejackson commented 2 years ago

I haven't included the complete results and I edited a bit for visual clarity. I have 5400 lines of this in the diff result. I can make the full file available if you need it.

Apart from the first line 'created' all the differences are media paths ("\" versus "/") . I searched the file that I output the diff command to for some of the IDs in the image about and did not find them.

Here's a side note that may add some insight. For my first attempt to run the Gramps Example DB with webapisync I imported the example.gramps file into two separate databases (example-desktop) and example-docker). I imagine every timestamp was different and all the tags that Gramps sets on import. The synchronization crashed with an error. Too many differences.

My computer's locale is US English.

<created date="2021-10-27" version="AIO64-5.1.4-1"/>
<created date="2021-10-27" version="5.1.4"/>

<file src="People/G/Gunton__Reginald_Arthur/Guntun__Reginald _Arthur-0002.jpg" mime="image/jpeg" checksum="ed226678448c05c7568daa3edacdda4f" description="Reginal Gunton - Death notice, 1958/02/25"/>
<file src="People\G\Gunton__Reginald_Arthur\Guntun__Reginald _Arthur-0002.jpg" mime="image/jpeg" checksum="ed226678448c05c7568daa3edacdda4f" description="Reginal Gunton - Death notice, 1958/02/25"/>

<file src="Families/G/Gunton__Reginald+Jackson__Muriel/Gunton__Reginald+Jackson__Muriel_0119.jpg" mime="image/jpeg" checksum="f389d141afdb2a72b06aa8a0b3baada5" description="Reginald Gunton + Muriel Jackson - Wedding, 1921/06/21"/>
<file src="Families\G\Gunton__Reginald+Jackson__Muriel\Gunton__Reginald+Jackson__Muriel_0119.jpg" mime="image/jpeg" checksum="f389d141afdb2a72b06aa8a0b3baada5" description="Reginald Gunton + Muriel Jackson - Wedding, 1921/06/21"/>

<file src="People/J/Jackson__Muriel_Augusta/Muriel Augusta JACKSON_0003.jpg" mime="image/jpeg" checksum="78374ed3cf16d98236f293131fb722ba" description="Muriel Jackson - Death notice, 1968/09/17"/>
<file src="People\J\Jackson__Muriel_Augusta\Muriel Augusta JACKSON_0003.jpg" mime="image/jpeg" checksum="78374ed3cf16d98236f293131fb722ba" description="Muriel Jackson - Death notice, 1968/09/17"/>
DavidMStraub commented 2 years ago

Oh, so it's simply the slashes vs. backslashes. OK, this needs to be fixed. Related issue: https://github.com/gramps-project/gramps-webapi/issues/187

DavidMStraub commented 2 years ago

On second thought, I'm not sure this is the issue you're seeing as I assume you have much more than 33 media files? Can you please try replacing all backslashes with slashes in the XML file and do another diff?

brucejackson commented 2 years ago

Will do.

I have recorded the 33 ID numbers. I thought I should extract the entries from both xml files and share with you. You may see things I don't.

On Thu, Oct 28, 2021, 01:36 David Straub @.***> wrote:

On second thought, I'm not sure this is the issue you're seeing as I assume you have much more than 33 media files? Can you please try replacing all backslashes with slashes in the XML file and do another diff?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DavidMStraub/gramps-addon-webapisync/issues/2#issuecomment-953517531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJTVOSB4LDVBCOXTYG4NLUJDVQVANCNFSM5G3EUZWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

brucejackson commented 2 years ago

I've just sent you a PM via discourse it contains a link containing the 33 items for both xml files.

Going through the events, I see custom event types that should be changed to Person Attributes, specifically type='martial status'. I do see some data quality issues that need to be addressed.

For individual I0149, I created a custom association Grandparent/Grandchild. From an obituary, I know the deceased, the children of the deceased and the grandchildren. I recorded all the family members but I don't know who the parents of each grandchild. I create the association so the grandchildren had some attachment to another person in the database.

For the media objects, I found it curious that the docker export used '\' in the src attribute. The Desktop AIO export used '/' in the src attribute. I did a double take on this and double checked the docker export. I did not expect this.

DavidMStraub commented 2 years ago

Thanks! While I'm looking into the files, another thing you can try is to export the local tree to XML, import it into a new tree, and sync that tree with the web API. If that doesn't find the 33 modified objects, we know that the issue is with something that disappears in the XML export.

brucejackson commented 2 years ago

You message came in as I was doing just that - an export, new tree, copy the new tree to docker and sync. That process corrected the issue - the plugin reports that local and remote are the same!

I tried a number of things prior to this:

I guess there was some kind of corruption or error in my current tree.

On Thu, Oct 28, 2021, 15:41 David Straub @.***> wrote:

Thanks! While I'm looking into the files, another thing you can try is to export the local tree to XML, import it into a new tree, and sync that tree with the web API. If that doesn't find the 33 modified objects, we know that the issue is with something that disappears in the XML export.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DavidMStraub/gramps-addon-webapisync/issues/2#issuecomment-954145196, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJTVLMTKVB2GZFNULYBRLUJGYPHANCNFSM5G3EUZWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

DavidMStraub commented 2 years ago

OK good to know! So check and repair didn't solve the issue?

brucejackson commented 2 years ago

Correct - Check and Repair did not solve the issue.

On Thu, Oct 28, 2021 at 4:18 PM David Straub @.***> wrote:

OK good to know! So check and repair didn't solve the issue?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DavidMStraub/gramps-addon-webapisync/issues/2#issuecomment-954171133, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJTVLRNJMYPOXVEERHZK3UJG4Z7ANCNFSM5G3EUZWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

DavidMStraub commented 2 years ago

I'd still like to understand what is causing it. From the XML snippets, I didn't get a clue so far. Do you still have the original tree as well as the re-imported one in your local Gramps database? If you export an XML from either of them, are they identical? If yes, can you try a JSON export using the JSON exporter plugin? Perhaps there is a difference that doesn't show up in XML but in JSON. gramps.gen.merge.diff, which is used by the sync plugin, is based on object JSON representation.

brucejackson commented 2 years ago

The json export is showing some differences, unfortunately the exports are in different sort orders so I have to manually compare. Is possible to get a json export from the docker instance. Something like: /api/exporters/json/file

I'd like to compare the 3 stages:

DavidMStraub commented 2 years ago

Exporting to JSON from the API should be possible if the JSON plugin is installed, but not by default.

But already the exports from your two desktop trees will be very useful.

Perhaps you can sort them by handle using Python? If they are lists, something like

open("gramps.json") as f:
    data = json.load(f)

data = sorted(data, key=lambda obj: obj["handle"])

open("gramps_sorted.json") as f:
    json.dump(data, f)
brucejackson commented 2 years ago

I installed the JSON plugin on the docker container and got the export. To esnure I got you the correct files, I restored a family tree database from backups from 2 days ago. Then recreate my steps. I wasn't confident in the files I had saved during the process.

To start with is the 3 media objects that were in the list of 33. I have 3 rows for each GrampsID.

gramps_ID: O0090 - 2 citations listed with identical handle in desktop-33 and docker-33. In desktop-0, one unique citation. Only O0090 had duplicate citations in desktop-33 and docker-33.

gramps_ID: EBJ-O0135 - trailing whitespace in desc in desktop-33 and docker-33. No trailing whitespace in desktop-0. gramps_ID: EBJ-O0863 - trailing whitespace in desc in desktop-33 and docker-33. No trailing whitespace in desktop-0.

Only EBJ-O0135 and EBJ-O0863 had trailing whitespace in desc in desktop-33 and docker-33

https://www.dropbox.com/s/kfhdoblfo1mmzp1/media-objects.json?dl=0

This is what I see. I will look at individuals and events tomorrow. Let me know if its easier if I organize this info in a different manner.

DavidMStraub commented 2 years ago

Interesting, so we already found one issue: duplicate citation references. On importing the XML, the duplicates are discarded because of this: https://github.com/gramps-project/gramps/blob/master/gramps/gen/lib/citationbase.py#L89-L90 I checked that for other similar objects (adresses, URLs, tags, ...) this is not checked in Gramps and the add_... methods just append to the list without checking for duplicates.

I'm curious whether all of the 33 differences are caused by this.

DavidMStraub commented 2 years ago

As for the trailing space, I'm not sure at which point it is discarded - didn't find a strip() in the import/export XML code.

brucejackson commented 2 years ago

For Event objects, I see 3 differences between desktop-0 and docker-33 / desktop-33.

The duplicate citation issue is seen again in E0756.

An issue with place=

An issue with date (in the text attribute):

I'll send a link to zip file with all the event via PM as I am not sure how recent or old all the events are.

brucejackson commented 2 years ago

For individuals

Trailing space for:

IO149 is different issue. The value of birth_ref_index is different.

I don't see anything living people in this file so : https://www.dropbox.com/s/dftw6lr89237363/webapisync-issue2-individual-objects.zip?dl=0

brucejackson commented 2 years ago

As for the trailing space, I'm not sure at which point it is discarded - didn't find a strip() in the import/export XML code.

Think it may be here for descriptions of media objects: (https://github.com/gramps-project/gramps/blob/1d51576a28ec9c648631f3c4c8d0d74a88a5c8b6/gramps/plugins/export/exportxml.py#L1268)

desc_text = ' description="%s"' % self.fix(desc)

Which leads to here (https://github.com/gramps-project/gramps/blob/1d51576a28ec9c648631f3c4c8d0d74a88a5c8b6/gramps/plugins/export/exportxml.py#L454):

    def fix(self, line):
        l = str(line)
        l = l.strip().translate(strip_dict)
        return escxml(l)

And strip_dict includes spaces, if I interpret this correctly. (https://github.com/gramps-project/gramps/blob/1d51576a28ec9c648631f3c4c8d0d74a88a5c8b6/gramps/plugins/export/exportxml.py#L80)

# table for skipping control chars from XML except 09, 0A, 0D
strip_dict = dict.fromkeys(list(range(9))+list(range(11,13))+list(range(14, 32)))
cdhorn commented 2 years ago

Hey guys, have not been following this very closely but in glancing over your conversation this morning I noticed the recasting of place from null to ""

As it might also cause some unexpected surprises I wanted to point out that some type conversions are being done in emit.py in the WebApi code so the output validates consistently against the json schema.

In extract_object:

            if key == "rect" and value is None:
                value = []
            if key in ["mother_handle", "father_handle", "famc"] and value is None:
                value = ""

In extract_objects:

                if key in ["lat", "long"] and value is None:
                    value = 0

So I assume an edit/update of an object with a recast value in Gramps.js would save the recast value as part of the update. This is probably okay since the object was modified anyway but wanted to point it out.

DavidMStraub commented 2 years ago

Hi @cdhorn, you are right, this is why I needed to add a (not very elegant) function that undos these changes when PUTting modified objects: https://github.com/gramps-project/gramps-webapi/blob/master/gramps_webapi/api/resources/util.py#L878-L879

The good thing is that Web API explicitly checks for correct JSON schema on PUT, so any wrong type would lead to an error. This helps to avoid introducing database inconsistencies by sloppy code in Gramps.js.

In any case, I believe this is unrelated to the sync issue here as that directly uses the Gramps XML export. JSON is only used when syncing the changes back, but that uses the transactions endpoint which uses the raw/unmodified Gramps JSON objects.

DavidMStraub commented 2 years ago

@brucejackson thanks for the additional data. So we have the duplicate citations and plenty of trailing white space issues.

IO149 is an interesting case. It has a duplicate birth event ref and birth ref index points to the first instance in one case and the second in the other. Duplicate event refs are a bit more tricky than duplicate citation refs as they can have attributes and citations; in principle you could reference the same event twice and attach different attributes.

As for the duplicate citations and trailing whitespace, IMO this should be added to the check & repair addon.

The problem with the Web API Sync is that these issues will not only show false positives in the db diff, but they will in fact make it impossible to sync as Web API will complain that the objects have been modified during the Sync (the transaction contains the original and the new object version, and the original is compared with what is in the database, but the original has passed through an XML export/import on the way and lost some whitespace ...)

brucejackson commented 2 years ago

I'm curious about I0149 and the note about duplicate birth event refs. I interpret that as I should see 2 birth events for the person but I don't see 2 birth events in the user interface. Am I thinking about this wrong?

image

DavidMStraub commented 2 years ago

Oh, I was wrong. The event ref handles are distinct, so there is no duplicate. Strange that birth ref index is wrong in one database...