Closed tobymarsden closed 3 years ago
definitely a bug, and yes PR would be fantastic if you are up to it
@dimus PR at https://github.com/gnames/gnparser/pull/175
I see that preprocessing adds to the problem, because there is a substitution of all hybrid characters to ×
. I will need to think a bit how to reorganize the code to get the correct verbatim.
The problem was largely caused by a code debt, where an unnecessary legacy struct parser.wordNode
was shoehorned into
parsed.Word
. I removed the legacy struct. Also I added test_data_cultivars.md to tools/gentest.go to simplify test generation where many changes are introduced. I am adding a section how to use the tool to CONTRIBUTING.md
When parsing
Magnolia x soulangeana
, thewords
section of the details looks like this:(I wonder if
verbatim
should bex
instead of×
, as currentlysubsp
is tosubsp.
, but that's a nitpick.)However, when parsing
Magnolia denudata x Magnolia liliiflora
, the output is:i.e. the
HYBRID_CHAR
word has emptyverbatim
andnormalized
properties.The same applies to names like
× Sorbopyrus auricularis
.Is this a bug, and if so, would you consider a PR?