globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
19 stars 3 forks source link

wfo: species names only match when quoted #164

Closed jhpoelen closed 1 year ago

jhpoelen commented 1 year ago
echo -e "\tAster persaliens" | nomer append wfo

produces

    Aster persaliens    NONE        Aster persaliens                                

whereas

echo -e "\t\"Aster persaliens\"" | nomer append wfo

produces

    "Aster persaliens"  HAS_UNCHECKED_NAME  WFO:0000000002  "Aster persaliens"  E.S.Burgess species     "Aster persaliens"  WFO:0000000002  species     http://www.worldfloraonline.org/taxon/wfo-0000000002

it appears that for some reason, the wfo species names include quotes around their species names.

Suggest to re-index and remove quotes.

jhpoelen commented 1 year ago

Root cause appears that WFO changed their classification export method.

now,

via

preston cat\
 --no-cache\
 --anchor hash://sha256/12051b8aa59930d6561a3ed46b7cf3f67a31a98445a457d78894f6b8a8e81641\
 --remote https://zenodo.org/record/8326175/files/,https://zenodo.org/record/8327611/files/\
 'zip:hash://sha256/476f0b01cb943ac34698f510f5df77722846b72f3fb3ad37a0039d338d38e033!/classification.csv'\
 | head -n2\
 | mlr --itsvlite --oxtab cat
taxonID                  wfo-0001302010
scientificNameID         
localID                  9905237
scientificName           "Schoenoxiphium ecklonii var. ecklonii"
taxonRank                variety
parentNameUsageID        
scientificNameAuthorship 
family                   Cyperaceae
subfamily                
tribe                    
subtribe                 
genus                    Schoenoxiphium
subgenus                 
specificEpithet          ecklonii
infraspecificEpithet     ecklonii
verbatimTaxonRank        variety
nomenclaturalStatus      
namePublishedIn          
taxonomicStatus          Synonym
acceptedNameUsageID      wfo-0000528775
originalNameUsageID      
nameAccordingToID        
taxonRemarks             "Source in seed data: tro More details could be found in <a href=http://www.theplantlist.org/tpl1.1/record/tro-9905237 >The Plant List v.1.1.</a> Originally in <a href=http://www.theplantlist.org/tpl/record/tro-9905237 >The Plant List v.1.0</a>"
created                  2022-04-16
modified                 2022-04-20
references               http://www.theplantlist.org/tpl1.1/record/tro-9905237
source                   "World Checklist of Vascular Plants. Facilitated by the Royal Botanic Gardens, Kew."
majorGroup               A
tplID                    tro-9905237

but previously,

via

preston cat\
 --no-cache\
 --anchor hash://sha256/12051b8aa59930d6561a3ed46b7cf3f67a31a98445a457d78894f6b8a8e81641\
 --remote https://zenodo.org/record/8326175/files/,https://zenodo.org/record/8327611/files/\
 'zip:hash://sha256/e6f0b5079fc57a9a13874036473e19fbbf4b8bbc93c6dadeb82d44a60c7552fb!/classification.txt'\
 | head -n2\
 | mlr --itsvlite --oxtab cat

produced

taxonID                  wfo-0000000001
scientificNameID         urn:lsid:ipni.org:names:195146-1
localID                  GCC-FA54B065-8C1D-48CC-8CE0-000012FB41F0
scientificName           Cirsium caput-medusae
taxonRank                SPECIES
parentNameUsageID        
scientificNameAuthorship Schur ex Nyman
family                   Asteraceae
subfamily                
tribe                    
subtribe                 
genus                    Cirsium
subgenus                 
specificEpithet          caput-medusae
infraspecificEpithet     
verbatimTaxonRank        
nomenclaturalStatus      
namePublishedIn          Consp. Fl. Eur. 2: 408 (1879)
taxonomicStatus          SYNONYM
acceptedNameUsageID      wfo-0000027702
originalNameUsageID      
nameAccordingToID        
taxonRemarks             More details could be found in <a href=http://www.theplantlist.org/tpl1.1/record/gcc-1 >The Plant List v.1.1.</a> Originally in <a href=http://www.theplantlist.org/tpl/record/gcc-1 >The Plant List v.1.0</a>
created                  2012-02-11
modified                 
references               http://www.theplantlist.org/tpl1.1/record/gcc-1
source                   gcc
majorGroup               A
tplId                    http://www.theplantlist.org/tpl1.1/record/gcc-1

no quotes were used.

Suggest to explicitly remove quotes, even though IANA tab-separated values format https://www.iana.org/assignments/media-types/text/tab-separated-values does not not technically require them.

jhpoelen commented 1 year ago

With changes, the following result is produced via

echo -e "\tAster persaliens"\
 | nomer append --include-header wfo\
 | mlr --itsvlite --oxtab cat

yielded

providedExternalId      
providedName            Aster persaliens
relationName            HAS_UNCHECKED_NAME
resolvedExternalId      WFO:0000000002
resolvedName            Aster persaliens
resolvedAuthorship      E.S.Burgess
resolvedRank            species
resolvedCommonNames     
resolvedPath            Aster persaliens
resolvedPathIds         WFO:0000000002
resolvedPathNames       species
resolvedPathAuthorships 
resolvedExternalUrl     http://www.worldfloraonline.org/taxon/wfo-0000000002
jhpoelen commented 1 year ago

available via https://github.com/globalbioticinteractions/nomer/releases/tag/0.5.5