globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
18 stars 3 forks source link

Unexpected _sic in discoverlife names #51

Closed seltmann closed 2 years ago

seltmann commented 2 years ago

In a review of dump discoverlife some of the accepted names need further data cleaning. Any name ending with the _sic should remove everything after the _

For example: Andrena apicatus_sic should be changed to Andrena apicatus

jhpoelen commented 2 years ago

Now removing the _sic suffix as suggested.

Before change we had:

$ nomer list discoverlife | grep -P "Andrena apicatus" 
using matcher [discoverlife-taxon]
DiscoverLife name indexing started...
[50590] DiscoverLife names were indexed in 19s (@ 2662 names/s)
https://www.discoverlife.org/mp/20q?search=Andrena+apicatus_sic Andrena apicatus_sic    SYNONYM_OF  https://www.discoverlife.org/mp/20q?search=Andrena+apicata  Andrena apicata species     Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Andrena apicata    https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Andrena+apicata  kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Andrena+apicata

and after change, the same query results in a name without the _sic:

$ nomer list discoverlife | grep -P "Andrena apicatus" 
using matcher [discoverlife-taxon]
DiscoverLife name indexing started...
[50590] DiscoverLife names were indexed in 18s (@ 2810 names/s)
https://www.discoverlife.org/mp/20q?search=Andrena+apicatus Andrena apicatus    SYNONYM_OF  https://www.discoverlife.org/mp/20q?search=Andrena+apicata  Andrena apicata species     Animalia | Arthropoda | Insecta | Hymenoptera | Andrenidae | Andrena apicata    https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Andrenidae | https://www.discoverlife.org/mp/20q?search=Andrena+apicata  kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Andrena+apicata  
seltmann commented 2 years ago

@jhpoelen _sic still appears in the name link:

https://www.discoverlife.org/mp/20q?search=Coelioxys+(Coelioxys)+pasteeli Coelioxys (Coelioxys) pasteeli SYNONYM_OF https://www.discoverlife.org/mp/20q?search=Coelioxys+pasteeli_sic Coelioxys pasteeli_sic species Animalia | Arthropoda | Insecta | Hymenoptera | Megachilidae | Coelioxys pasteeli_sic https://www.discoverlife.org/mp/20q?search=Animalia | https://www.discoverlife.org/mp/20q?search=Arthropoda | https://www.discoverlife.org/mp/20q?search=Insecta | https://www.discoverlife.org/mp/20q?search=Hymenoptera | https://www.discoverlife.org/mp/20q?search=Megachilidae | https://www.discoverlife.org/mp/20q?search=Coelioxys+pasteeli_sic kingdom | phylum | class | order | family | species https://www.discoverlife.org/mp/20q?search=Coelioxys+pasteeli_sic