francoisforster / gedcom-cleanup

A simple Kotlin library that compares GEDCOM files, cleans them and performs limited validation
7 stars 0 forks source link

ID problems with Gramps and Ancestry #2

Closed ennoborg closed 2 years ago

ennoborg commented 2 years ago

When I compare files generated by Gramps and Ancestry, I run into two different problems:

  1. The program complains about an unrecognized number UBM,
  2. The program complains about unrecognized numbers like 332108160602.

The first problem is linked to Gramps using "@SUBM@" as the submitter ID, and the program assuming that the ID (or pointer) is a single character followed by a number.

The second problem is linked to the length of the IDs on Ancestry. They're all the usual character followed by a number, but that number is 12 digits long, so it's not an int.

francoisforster commented 2 years ago

@ennoborg can you paste a snippet of the data the program has trouble with?

francoisforster commented 2 years ago

Or does it match

1 SUBM @SUBM@
0 @SUBM@ SUBM
1 NAME ...
0 @I332108160602@ INDI
francoisforster commented 2 years ago

If that's the case, it should be fixed with https://github.com/francoisforster/gedcom-cleanup/commit/7318dfd6ff990d6780ba54375d300f0f19a80cee

ennoborg commented 2 years ago

You got that right, and the fix works well, but now it turns out that Ancestry doesn't follow the GEDCOM standard for dates, so your string comparison will always fail. That's because Ancestry writes all month names in full, using upper and lower case.

So it looks like you need shortDate vars in Event.Matches, but I don't know whether you want to open a new issue for that.

francoisforster commented 2 years ago

@ennoborg do you mind opening a new issue for it and attaching a snippet of the offending dates?

francoisforster commented 2 years ago

Closing this issue as the id problem is fixed