globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
18 stars 3 forks source link

add support for indexing and matching against batnames.org #91

Open jhpoelen opened 2 years ago

jhpoelen commented 2 years ago

related to #90

jhpoelen commented 2 years ago

with newly added (basic) support for matching against batbase, I found:

$ nomer ls batnames | wc -l
[main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [batnames]
1456

which matches the batnames.org website.

Also, an example of a single match:

$ echo -e "\tRhinolophus sinicus" | nomer append batnames
[main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [batnames]
    Rhinolophus sinicus HAS_ACCEPTED_NAME   https://batnames.org/species/Rhinolophus%20sinicus  Rhinolophus sinicus     Chinese Rufous Horseshoe Bat @en                https://batnames.org/species/Rhinolophus%20sinicus  

or

$ echo -e "\tRhinolophus sinicus" | nomer append --include-header batnames | mlr --itsvlite --omd cat
providedExternalId providedName relationName resolvedExternalId resolvedName resolvedRank resolvedCommonNames resolvedPath resolvedPathIds resolvedPathNames resolvedExternalUrl resolvedThumbnailUrl
Rhinolophus sinicus HAS_ACCEPTED_NAME https://batnames.org/species/Rhinolophus%20sinicus Rhinolophus sinicus Chinese Rufous Horseshoe Bat @en https://batnames.org/species/Rhinolophus%20sinicus
jhpoelen commented 2 years ago

@ajacsherman here's the list of all indexed batnames names retrieved via

$ nomer ls --include-header batnames | mlr --itsvlite --csv cat

batnames.csv

jhpoelen commented 2 years ago

In attempting to align MDD with batnames using:

$ curl "https://raw.githubusercontent.com/mammaldiversity/mammaldiversity.github.io/master/_data/mdd.csv" | mlr --csv filter '$order == "CHIROPTERA"' | mlr --csv cut -f sciName | sed 's/_/ /g' | sed 's/^/\t/g' | nomer append batnames | grep NONE | head 
[main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [batnames]
    sciName NONE        sciName                     
    Chironax tumulus    NONE        Chironax tumulus        
    Lissonycteris angolensis    NONE        Lissonycteris angolensis
    Coelops hirsutus    NONE        Coelops hirsutus        
    Doryrhina corynophyllus NONE        Doryrhina corynophyllus     
    Doryrhina edwardshilli  NONE        Doryrhina edwardshilli      
    Doryrhina muscinus  NONE        Doryrhina muscinus      
    Doryrhina semoni    NONE        Doryrhina semoni        
    Doryrhina stenotis  NONE        Doryrhina stenotis      
    Doryrhina wollastoni    NONE        Doryrhina wollastoni

it appears that 61 names are defined in MDD that are not accepted in batnames.

via

$ curl "https://raw.githubusercontent.com/mammaldiversity/mammaldiversity.github.io/master/_data/mdd.csv" | mlr --csv filter '$order == "CHIROPTERA"' | mlr --csv cut -f sciName | sed 's/_/ /g' | sed 's/^/\t/g' | tail -n+2 | nomer append batnames | grep NONE | wc -l
61

this resulted in response by Nancy S., author of batnames to point out that the batnames integration is far from complete:


Some of the names you found are simply listed under different genera in Batnames vs MDD. For example., this from the "behind the scenes" comments in Batnames: Doryrhina was previously considered a synonym of Hipposideros; but clearly distinct; see Foley et al. (2017). Based on Foley et al. (2017), some authors (Tuneu-Corral, 2019; Mammal Diversity Database, 2021) have transferred the Hipposideros species corynophyllus, muscinus, semoni, stenotis, and wollastoni to Doryrhina; however, Foley et al. (2017) did not conduct any molecular analyses with these species and explicitly stated that these species were in need of additional study to determine their placement. We prefer to retain these species in Hipposideros until that review can be conducted. The species edwardshilli has also been transferred to Doryrhina by Tuneu-Corral (2019) and the Mammal Diversity Database (2021), without comment.

So, additional work is needed to support synonyms etc.