Closed dimus closed 1 year ago
I do try to normalize/simplify characters if it does not change semantic meaning. My impression is that '
and ’
are used interchangeably for authors in scientific names, and I picked '
because it is ASCII, meaning it will generate less problems for people with weird default encoding.
The original spelling of the authorship is preserved in JSON format in the verbatim field:
"authorship": {
"verbatim": "B.D’Orbigny",
"normalized": "B. D' Orbigny",
"authors": [
"B. D' Orbigny"
],
"originalAuth": {
"authors": [
"B. D' Orbigny"
]
}
},
It might make sense to leave verbatim authorship in csv/tsv output, let me think about it a bit.
@dimus, I've rechecked the original dataset and found that the compilers used both characters: 3 records Acteocina candei (D’Orbigny, 1841) 37 records Acteocina candei (D'Orbigny, 1842)
gnparser converted both to apostrophe in Author, which is OK. I was looking at "D’Orbigny" in the verbatim field and thinking I had inputted "D'Orbigny", so my mistake, all is well. In my pseudo-duplicate search the results are fine:
Acteocina candei (D’Orbigny, 1841) [3] Acteocina candei (D'Orbigny, 1842) [37]
From https://github.com/gnames/gnparser/issues/245
Another issue is that "D'Orbigny" in the original is "D’Orbigny" in the gnparser output. Why change UTF-8 27 to e2 80 99?