elliotchance / gedcom

👪 A Go library and CLI tools for encoding, decoding, traversing, merging, comparing, querying and publishing GEDCOM files.
MIT License
97 stars 22 forks source link

Add documentation, diff table headers, implement #324, and refactor logic into function #325

Open Sternbach-Software opened 2 years ago

Sternbach-Software commented 2 years ago

This change is Reviewable

Sternbach-Software commented 2 years ago

Fixes #324

Sternbach-Software commented 2 years ago

The diff table headers add column labels, showing which file is associated with which column, and assigning the title "Similarity score" to the middle column.

elliotchance commented 2 years ago

@shmueldabomb441 just waiting for your code update to rereview

Sternbach-Software commented 2 years ago

Ok.

Sternbach-Software commented 2 years ago

I realized that in my GoLand, settings, it runs go fmt every build. Then isn't this the right format?

Sternbach-Software commented 2 years ago

@elliotchance do I need to do anything like create another pull request for you to approve d2dc486 ?

elliotchance commented 2 years ago

I think there's come confusion here between the comments, so I'll clarify:

  1. There's no need to add a new CLI argument. The matching logic is fine to add to SimpleNode.Equals because that's what it's for.
  2. You need to add a unit test for SimpleNode.Equals to show that it works. I know you tested it manually, I'm not saying it doesn't work but we write unit tests to clarify behavior and prevent regressions in the future. Add a new test case here: https://github.com/elliotchance/gedcom/blob/master/simple_node_test.go#L29
Sternbach-Software commented 2 years ago
  1. I personally wanted to add a flag (even just for myself), because it is potentially less efficient to check for Ancestry sources. If I am not using an Ancestry tree anyway, there is no need to run those double loops (usually only one or two elements are looped over, but still) and all those equality checks and String() calls.

  2. I added automated unit tests to simple_node_test.go in my most recent commit mentioned above. Link to test (it is the last change in the commit, at the bottom of the list on GitHub - sorry I forgot to mention that in the commit message).

I made it a separate test because I couldn't figure out how to integrate it in the already existing function (I tried for a very long time). This is my first time ever being exposed to Go (I think I did pretty well though), and so I did not know how to resolve the errors I was getting. I also didn't 100% understand the logic of the function (I got most of it), which contributed to making it hard to add another case in that function for the Ancestry case. I can add a call in SimpleNode_equals_test() to assert.True(AncestryEquals_Test()) or copy-paste the code from the function if you want, but it would look out of place with the rest of the function.

I agree it makes most sense to have it in the test of the equals method, but because of the above challenges, I would just copy paste it in. Though, I can hear an argument why it may be slightly better to have a separate function for this edge case (I am quite familiar with regression testing, TDD, etc.).

Sternbach-Software commented 2 years ago

I would like to add a flag to ignore sources altogether in SimpleNode.Equals(), because in my import from ancestry to geni, every single person with a source in Ancestry will inevitably be marked as different from the one in geni, but all I care about are the actual facts (e.g. birth/death date, birth/death location, etc.), because my primary goal on Geni is to connect more people with the World Family Tree, and you rarely need sources like census records for that. So, I don't want to see differring sources, only core GEDCOM data, like BIRT, DEAT, etc.

If I add this, will you accept a PR?