Open Sternbach-Software opened 1 year ago
Maybe with this SO about cmp.equals() for structs?
Looks like this is the function (func (node *IndividualNode) Similarity(other *IndividualNode, options SimilarityOptions) float64 {}
). Though, not sure if we want to mess with their similarity (because that is checking if they are the same person), as much as excluding them from the final HTML if they are similar in everything except sources (as solely measured by _APID) - though it would have the desired effect.
Tag.Is(Tag) is here, but it doesn't look like that is used to exclude it from the diff.
Found where the code should go, in SimpleNode.Equals(). You may want to add a command line param to enable or disable this.
And how would you address the difference in which ancestry and Geni output dates? These two should be equivalent, not similar (I want to even keep matches that are 99.99% similar, but not this).
Address | Flushing | |
---|---|---|
City | Flushing | |
State | New York | |
Continued | New York United States of America | |
Country | United States of America | |
Place | Flushing, Queens County, New York, United States of America |
For cases like this, we need to add a some custom comparison for PlaceNode
(https://github.com/elliotchance/gedcom/blob/master/place_node.go). I can't remember how I implemented the similarity off the top of my head, but there should probably be a interface that nodes can implement if they want custom similarity logic.
It might make the most sense to add a String
method to reduces a place into a single string line, then perform the comparison on the strings. So, in this case the similarity would be:
Flushing, New York New York United States of America, United States of America
Flushing, Queens County, New York, United States of America
These are still not exactly equal, but they are close enough to give a high similarity number that should be over the "equals" threshold.
Running
gedcom diff -hide-equal
on two Ancestry gedcoms yields the same profiles with the same sources as 99.58% similar - but different, because their sources have different SOUR tags but the same _APID tags (which is the UID of Ancestry sources). Is there a way to specify to check source equality by _APID? If not from the CLI, where in code would I do this? All I want is that if two sources are the same, that they don't appear in the HTML output diff (and if they were the only diff, that the individual is not included in the diff).