danparshall / namegraph

Other
1 stars 0 forks source link

comprehensive test cases #22

Closed danparshall closed 2 years ago

danparshall commented 2 years ago

We should attempt to write some comprehensive test cases. This would include: 1) one-, two-, and three-token surnames (3x) 2) for each the mother's surname and father's surname (2x) 3) when the parents are stored as legal vs social form (2x) 4) when using both the long form (both prenames and both apelidos ) and the short form (just 1 prename and apelido) (2x) 5) when both the citizen and each parent have some sort of honorific (2x) 6) when the multi-part surname has both spaces and underscores (2x)

So that's almost 100 cases, and I'm sure we'll discover some more as we go along. But if we can handle all of those, there should be very little remaining.

juan-andres-russy commented 2 years ago

I though I had solve this issue but then I notice I didn't manage correctly the condition 5. about honorifics; I was not strict with their use on the parents.

One question: Do you want _parsefullrow() to delete husband honorifics of 'nombre' inside the function? I notice when I put an honorific on the name it is classified as prename on the surname test dataset but there also exists the function _fix_husbandhonorific() which handles the honorifics on the mothers name only.

I put some Z/S, B/V problems on purpose on the test data so I can manage that issue later.

danparshall commented 2 years ago

Ultimately, we should expand "fix_husband_honorific()" so that it handles both when it's in the citizen name, as well.

I doubt that it's possible to correctly identify husband honorifics on the first pass, so I don't think we'll be able to put it into "parse_fullrow()"