Open Sternbach-Software opened 1 year ago
I'm wondering if the best you can do is remove definite duplicate fields or individuals.
There are two things to explore here:
Going with option 2, you may be able to test how that might work with something like:
-left-gedcom primary.gedcom -right-gecom right1.gedcom -right-gecom right2.gedcom
There might need to be some special options in this case where primary is always included and it just show the closest match (if any) from each respective right GEDCOM.
In my case, I have many trees from Geni.com and Ancestry, and I don't know which one is most comprehensive, and some have a lot of information (not necessarily regarding overlapping parts of the tree) that the others don't, so I don't have a primary. It is possible that by comparing each one, I could identify (or create) a primary though, but it would be work.
Unless similar to the tune
command, there could be a command to determine which tree makes most sense to make the primary one and diff the others against it. Maybe the one which is missing the least? Though, that may not be good because if there was a tree with too much information on it which was incorrect, and the primary tree is the revised one with less information on it.
Unless similar to the tune
command, there could be a command to determine which tree makes most sense to make the primary one and diff the others against it. Or, merge all of the "right" trees, and compare that to the left.
The GEDCOM comparison can already make full use of multiple cores to speed it up, see -jobs
:
https://github.com/elliotchance/gedcom/blob/master/cmd/gedcom/diff.go#L84-L86
And, perhaps even better, if it knows two individuals are the same (by an identifier) it can avoid the expensive comparison altogether.
However, consider the numbers: Comparing two small trees of 1000 individuals takes 1 million comparisons (that's fine), but three trees of 1000 individuals requires 1 billion comparisons to be exhaustive (not fine). Trying to compare many trees (even if they are quite small) will exponentially increase the processing time required.
Depending on what you're goal is, it probably makes more sense to just choose a primary file and have everything work against that.
Over time, I have accumulated multiple trees across multiple platforms, and they have regrettably became out of sync. I want to condense them all into one tree, but that means diffing multiple trees, something gedcomdiff doesn't currently support.
It would be unbelievable if I could diff multiple GEDCOMs and display the diff into a single HTML. The coloring would represent that a field is either missing from one file or added/missing from the rest. It would be cool to see for a given field which files are missing it and which have it. Is this feasible?
Sample code (not a Go expert, but I think this works):