gnames / gnparser

GNparser normalises scientific names and extracts their semantic elements.
MIT License
38 stars 4 forks source link

Problems parsing complex author strings with Ex authors when there is an author with initial X #188

Open KatjaSchulz opened 3 years ago

KatjaSchulz commented 3 years ago

Here are some species names that get interpreted as hybrid names by gnparser:

Ceratophysella skarzynskii Weiner WM & Sun X in Weiner, WM, Xie, Z-J, Li, Y & Sun, X, 2019 Oligaphorura kedroviensis Shveenkova YB & Sun X in Sun, X, Shveenkova, YB, Xie, Z-J & Babenko, AB, 2019 Oligaphorura wanglangensis Sun X & Xie Z-J in Sun, X, Shveenkova, YB, Xie, Z-J & Babenko, AB, 2019 Semicerura bryophila Potapov M & Sun X in Potapov, M, Xie, Z-J, Kuprin, A & Sun, X, 2020 Semicerura draconis Potapov M & Sun X in Potapov, M, Xie, Z-J, Kuprin, A & Sun, X, 2020

It looks like the issue is the X initial of the author in the first part of the author string. If I replace that first X initial with another letter but leave the second X initial after the "in", the name gets parsed properly. Names are from COL2021-07-29.

dimus commented 2 years ago

It is a tricky one, I'll think what to do. Also I am not sure what to do with authorship in this format, where initials are after the name.