GlobalNamesArchitecture / gnparser

Split scientific names to meaningful elements with meta information
https://parser.globalnames.org/
MIT License
20 stars 2 forks source link

Parse "Drosophila obscura-x Burla, 1951" #385

Closed dimus closed 6 years ago

dimus commented 6 years ago

It is currently parsed as uninomial

alexander-myltsev commented 6 years ago

The problem is that piece https://github.com/GlobalNamesArchitecture/gnparser/blob/09ffbf8741f8006323b21c534ff002893ade5339/parser/src/main/scala/org/globalnames/parser/Preprocessor.scala#L54-L59

replaces x that is not parseable by lowerChar in word1 rule. There are options to resolve it that we should discuss.

This patch should be enough to solve the issue:

diff --git a/parser/src/main/scala/org/globalnames/parser/Parser.scala b/parser/src/main/scala/org/globalnames/parser/Parser.scala
index adb769a..2f5997d 100644
--- a/parser/src/main/scala/org/globalnames/parser/Parser.scala
+++ b/parser/src/main/scala/org/globalnames/parser/Parser.scala
@@ -342,7 +342,7 @@ class Parser(preprocessorResult: Preprocessor.Result,
   }

   def word1: Rule1[CapturePosition] = rule {
-    capturePos((LowerAlpha ~ dash).? ~ lowerChar ~ oneOrMore(lowerChar))
+    capturePos((LowerAlpha ~ dash).? ~ oneOrMore(lowerChar))
   }

   def word2: Rule1[CapturePosition] = rule {