Closed tobymarsden closed 2 years ago
I've reused the hybrid flag to make consumption of the JSON output easier; notwithstanding that these aren't true botanical hybrids, it seems reasonable to use the term in the broadest sense given that it's a string value with more details anyway.
I think it is OK, because gnparser in general uses very 'broad' semantics in other parts, for example virus
flag includes everything that is not cellular. I think v1 of GNparser is about practicality, and covering its domain. And v2 might become a more scientically accurate in its definitions.
@dimus Awesome, thanks for looking at this!
@tobymarsden I asked around, and looked at the codes. It seems that graft-chimeras are completely in the realm of cultivars code, so it would be logical to parse them only when cultivar flag is on. Can you make this change in your PR and make them 'visible' only if cultivar flag is used? I think their tests also should be in cultivar test file.
I think if people go through names that suppose to be in ICN context, parser should break on graft-chimera names.
@dimus Makes perfect sense. I'll try to find some time this week to make the changes to the PR.
sounds great @tobymarsden
@dimus The graft-chimera support is now contingent on the -C
flag, and parsing breaks on graft chimeras without it.
I've updated the tests so the parsed graft-chimeras are in the cultivars file, and the main test file shows "parsed":false
for these names.
@tobymarsden perfect! Trying it now...
It all looks good to me, @tobymarsden, great work, merging...
I'm trying to get gnparser to parse all names in Kew's Plants of the World Online.
I bumped into a parsing failure when dealing with graft-chimeras, e.g.
This PR parses these names successfully without any impact on existing test cases, e.g.
and
I've reused the
hybrid
flag to make consumption of the JSON output easier; notwithstanding that these aren't true botanical hybrids, it seems reasonable to use the term in the broadest sense given that it's a string value with more details anyway.I had to adjust the stemmer but I added some stemmer-specific tests in.
The PR duplicates much of the HybridFormula code as the syntax is so close; I've another branch which refactors things to reuse the HybridFormula objects, but there was no performance benefit and the code is harder to follow (for me, anyway). If you prefer that approach, though, I can submit a PR from that branch instead.