Closed seltmann closed 3 years ago
I was able to produce the issue using
$ nomer dump discoverlife | grep "Andrena (Lepidandrena) firuzaensis var"
using matcher [discoverlife-taxon]
DiscoverLife name indexing started...
[50590] DiscoverLife names were indexed in 19s (@ 2662 names/s)
https://www.discoverlife.org/mp/20q?search=Andrena+(Lepidandrena)+firuzaensis+var Andrena (Lepidandrena) firuzaensis var SYNONYM_OF https://www.discoverlife.org/mp/20q?search=Andrena+firuzaensis Andrena firuzaensis species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Andrena firuzaensis https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Andrena+firuzaensis kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Andrena+firuzaensis
and found the related name record in the discover life bees source:
<tr bgcolor="#f0f0f0">
<td>
<i>
<a href="/mp/20q?search=Andrena+firuzaensis" target="_self">
Andrena firuzaensis
</a>
</i>
<font size="-1" face="sans-serif">
Popov, 1940
</font>
--
<i>
Andrena (Lepidandrena) firuzaensis
</i>
Popov, 1940;
<i>
Andrena (Lepidandrena) firuzaensis var
</i>
atra_homonym4 Popov, 1940;
<i>
Andrena popovella
</i>
Gusenleitner and Schwarz, 2002, replacement name
</td>
</tr>
Looks like there are a little over 200 names with dangling vars -
$ nomer dump discoverlife | grep -P " var\t" | wc -l
using matcher [discoverlife-taxon]
214
Here's a list of those chopped names:
$ cat /home/jorrit/proj/globi/nomer/nomer-taxon-resolver/src/main/resources/org/globalbioticinteractions/nomer/match/discoverlife/bees.xml.gz | gunzip | grep -P "var [^a-z]" | sort | uniq
Ammobates (Ammobates) lativalvis var
Ammobates (Euphileremus) handlirschi var
Andrena (Andrena) nasonii var
Andrena (Lepidandrena) firuzaensis var
Andrena (Parandrena) andrenoides var
Andrena (Ptilandrena) supervirens var
Andrena (Scrapter) imitatrix var
Andrena (Trachandrena) tacitula var
Anthophora (Micranthophora) curta var
Augochlora (Augochloropsis) vesta var
Bombus (Agrobombus) helferanus var
Bombus (Agrobombus) muscorum var
Bombus (Alpigenobombus) tetrachromus var
Bombus (Chromobombus) muscorum var
Bombus (Chromobombus) variabilis_homonym var
Bombus (Diversobombus) wilemani var
Bombus (Hortobombus) consobrinus var
Bombus (Hortobombus) mimeticus var
Bombus (Lapidariobombus) oculatus var
Bombus (Lapidariobombus) rufofasciatus var
Bombus (Lapidariobombus) sicheli var
Bombus (Lapidariobombus) tenellus var
Bombus (Leucobombus) terrestris var
Bombus (Melanobombus) confusus var
Bombus (Orientalibombus) orientalis var
Bombus (Pratobombus) atrocinctus var
Bombus (Pratobombus) biroi var
Bombus (Pratobombus) hypnorum var
Bombus (Pratobombus) impatiens var
Bombus (Pratobombus) parthenius var
Bombus (Pratombombus) mearnsi var
Bombus (Rhodobombus) helleri var
Bombus (Rufipedibombus) eximius var
Bombus (Senexibombus) bicoloratus var
Bombus (Subterraneobombus) fragans var
Bombus (Terrestribombus) lucorum var
Bombus (Terrestribombus) terrestris var
Bremus (Alpigenobombus) dentatus var
Bremus (Alpigenobombus) grahami var
Bremus (Bremus) ignitus var
Bremus (Lapidariobombus) formosellus var
Bremus (Pratobombus) mearnsi var
Bremus (Rufipedibombus) rufipes var
Bremus (Senexibombus) senex var
Bremus (Sibiricobombus) oculatus var
Centris (Epicharis) conica var
Centris (Epicharis) dejeani var
Centris (Epicharis) maculata var
Centris (Epicharis) rustica var
Centris (Epicharis) umbraculata var
Centris (Hemisia) nitens var
Centris (Melanocentris) furcata var
Centris (Melanocentris) obsoleta var
Centris (Melanocentris) petreae var
Centris (Ptilotopus) denudans var
Ceratina (Ceratinidia) hieroglyphica var
Ceratina (Ceratinidia) lepida var
Ceratina speculifrons var
Chalicodoma (Chalicodoma) lefebvrei var
Euaspis (Parevaspis) basalis var
Euglossa (Eufriesea) magrettii var
Euglossa (Euglossa) cordata var
Euglossa (Euglossa) variabilis var
Euglossa (Eulaema) nigrita var
Euglossa (Eulema) dimidiata var
Euglossa (Eulema) mexicana var
Euglossa (Eulema) nigrifacies var
Euglossa (Eumorpha) combinata var
Euglossa (Eumorpha) magrettii var
Euglossa (Eumorpha) mariana var
Exomalopsis (Anthophorula) compactula var
Halictus (Chloralictus) pilosus var
Halictus (Corynura) corynogaster var
Halictus (Evylaeus) arcuatus var
Halictus (Thrichostoma) sjoestedti var
Hylaeus (Deranchylaeus) tenuis var
Megachile (Argyropile) parallela var
Megachile (Chalicodoma) lefebvrei var
Megachile (Chalicodoma) manicata var
Megachile (Chalicodoma) monstrifica var
Megachile (Chalicodoma) muraria var
Megachile (Chalicodoma) pyrenaica var
Megachile (Chelostomoides) exilis var
Megachile (Delomegachile) gemula var
Megachile (Delomegachile) melanophaea var
Megachile (Delomegachile) vidua var
Megachile (Eumegachile) bilobata var
Megachile (Eumegachile) sculpturalis var
Megachile (Litomegachile) brevis var
Megachile (Pseudocentron) pruina var
Megachile (Sayapis) frugalis var
Melissa (Epiclopus) gayi var
Melitoma (Ancyloscelis) chilensis var
Nomada (Holonomada) edwardsii var
Nomada (Micronomada) modesta var
Nomada (Nomadula) rhodosoma var
Nomada (Xanthidium) crotchii var
Nomada (Xanthidium) vallesina var
Nomia (Crocisaspidia) postscutellaris var
Nomia (Epinomia) bakeri var
Nomia (Hoplonomia) pulchribalteata var
Osmia (Melanosmia) nigrifrons var
Paratrigona (Paratrigona) ornaticeps var
Perdita (Perdita) eriastri var
Perdita (Perdita) macswaini var
Perdita (Pygoperdita) malacothricis var
Psaenythia (Psaenythia) bizonata var
Psaenythia (Psaenythia) rubripes var
Psithyrus (Allopsithyrus) barbutellus var
Psithyrus (Allopsithyrus) maxillosus var
Psithyrus (Ashtonipsithyrus) distinctus var
Psithyrus (Ashtonipsithyrus) vestalis var
Psithyrus (Fernaldaepsithyrus) flavidus var
Psithyrus (Fernaldaepsithyrus) norvegicus var
Psithyrus (Fernaldaepsithyrus) quadricolor var
Psithyrus (Fernaldaepsithyrus) sylvestris var
Psithyrus (Metapsithyrus) campestris var
Psithyrus (Metapsithyrus) pieli var
Psithyrus (Psithyrus) acutisquameus var
Sphecodes hispanicus subvar
Stenotritus elegans var
Trigona (Cephalotrigona) capitata var
Trigona (Geotrigona) acapulconis var
Trigona (Geotrigona) leucogastra var
Trigona (Hypotrigona) pendleburyi var
Trigona (Lepidotrigona) nitidiventris var
Trigona (Lepidotrigona) terminata var
Trigona (Lepidotrigona) ventralis var
Trigona (Lestrimelitta) limao var
Trigona (Nannotrigona) postica var
Trigona (Nannotrigona) testaceicornis var
Trigona (Oxytrigona) tataira var
Trigona (Parapartamona) zonata var
Trigona (Paratrigona) lineata var
Trigona (Paratrigona) opaca var
Trigona (Patera) testacea var
Trigona (Scaptotrigona) mexicana var
Trigona (Scaptotrigona) pectoralis var
Trigona (Tetragona) buchwaldi var
Trigona (Tetragona) dorsalis var
Trigona (Tetragona) fimbriata var
Trigona (Tetragona) fusco-balteata var
Trigona (Tetragona) fuscobalteata var
Trigona (Tetragona) heideri var
Trigona (Tetragona) jaty var
Trigona (Tetragona) nigra var
Trigona (Tetragona) sarawakensis var
Trigona (Tetragona) subgrisea var
Trigona (Trigona) dimidiata var
Trigona (Trigona) hypogea var
Trigona (Trigona) pallida var
Xylocopa (Afroxylocopa) caffra var
Xylocopa (Afroxylocopa) nigrita var
Xylocopa (Afroxylocopa) scioensis var
Xylocopa (Koptorthosoma) caerulea var
Xylocopa (Koptorthosoma) caeruleiformis var
Xylocopa (Koptortosoma) flavicollis var
Xylocopa (Xylocopa) rufipes var
@seltmann root cause for the var name chopping appears to be a syntax error on the discoverlife side.
I've implemented a workaround, and ideally the authors of Discover Life would correct the entries in which the var names are chunked with the authorship string.
Note that the example below related to a _homonym in addition to a chopped var name:
$ nomer dump discoverlife | grep "Andrena (Lepidandrena) firuzaensis var"
using matcher [discoverlife-taxon]
DiscoverLife name indexing started...
[50590] DiscoverLife names were indexed in 19s (@ 2662 names/s)
https://www.discoverlife.org/mp/20q?search=Andrena+(Lepidandrena)+firuzaensis+var+atra Andrena (Lepidandrena) firuzaensis var atra NONE https://www.discoverlife.org/mp/20q?search=Andrena+firuzaensis Andrena firuzaensis species Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Andrena firuzaensis https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Andrena+firuzaensis kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Andrena+firuzaensis
Note example for https://www.discoverlife.org/mp/20q?search=Megachile+lefebvrei .
One var name appears to be entered correctly (i.e., Megachile lefeburei var albomaculata Friese, 1898), however, suspicious var names are also included Megachile (Chalicodoma) lefebvrei var albida Pérez, 1897 and Megachile (Chalicodoma) muraria var variabilis Friese, 1920 .
You can visually see the issue by noting the errors do not have full italicized names: the var part is not italic.
See attached screenshot.
Another exception to the exception was found . . . in which a comma was added between the var
and the dangling var name.
Example:
Ceratina laevifrons var , moricei Friese, 1899
See https://www.discoverlife.org/mp/20q?search=Ceratina+moricei and attached screenshot
Also, please note that there's a dangling var with name A
.
Stenotritus elegans var A Cockerell, 1914
See https://www.discoverlife.org/mp/20q?search=Stenotritus+elegans and attached screenshot.
After implementation of workarounds, the following two names remain, both of which are var A
names.
$ nomer list discoverlife | grep -P "var [^a-z]" | sort | uniq
using matcher [discoverlife-taxon]
DiscoverLife name indexing started...
[50590] DiscoverLife names were indexed in 19s (@ 2662 names/s)
https://www.discoverlife.org/mp/20q?search=Ceratina+speculifrons+var+A Ceratina speculifrons var A SYNONYM_OF https://www.discoverlife.org/mp/20q?search=Ceratina+speculifrons Ceratina speculifrons species Animalia | Arthropoda | Insecta | Hymenoptera | Apidae | Ceratina speculifrons https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Apidae | https://www.discoverlife.org/mp/20q?search=Ceratina+speculifrons kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Ceratina+speculifrons
https://www.discoverlife.org/mp/20q?search=Stenotritus+elegans+var+A Stenotritus elegans var A SYNONYM_OF https://www.discoverlife.org/mp/20q?search=Stenotritus+elegans Stenotritus elegans species Animalia | Arthropoda | Insecta | Hymenoptera | Stenotritidae | Stenotritus elegans https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Stenotritidae | https://www.discoverlife.org/mp/20q?search=Stenotritus+elegans kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Stenotritus+elegans
@jhpoelen the var typically is not italic in writing scientific names so this would not be seen as an error. So Megachile (Chalicodoma) muraria var variabilis Friese, 1920 is incorrect and should be Megachile (Chalicodoma) muraria var variabilis Friese, 1920
I will ask the authors about Stenotritus elegans var A and Ceratina speculifrons var A correct Ceratina laevifrons var , moricei Friese, 1899 and be consistent reg. italics of var
@seltmann thanks for sharing your thoughts on taxonomic "var" names formatting.
Because most of the var parsing issues have been addressed, I'll close this issue and open a newer narrower ones, describing the external data issues related to the:
Please feel free to re-open this issue if you'd like to proceed in some other way.
In a review of dump discoverlife some of the accepted names need further data cleaning. A few names have a trailing var which indicates that the variety name after the var has been deleted
For example: Andrena (Lepidandrena) firuzaensis var should be changed to Andrena (Lepidandrena) firuzaensis var atra Andrena (Trachandrena) tacitula var should be changed to Andrena (Trachandrena) tacitula var grossulariae