AtlasOfLivingAustralia / name-preprocessing

Name source preprocessing for the ALA taxonomic index
Other
0 stars 1 forks source link

NZOR processing not detecting authorship properly #9

Closed charvolant closed 1 year ago

charvolant commented 1 year ago

Eg.

NZOR-6-52652    NZOR-6-52652    http://data.nzor.org.nz/names/731bb93e-c6ef-41d9-9159-705bfdeb0094  NZOR-6-52652    NZOR-6-55337Lepus europaeus occidentalis de Winton, 1898                        1898        Animalia    Chordata    Mammalia    Lagomorpha  Leporidae   Lepus       europaeus   occidentalis    subsp       de Winton       ICZN2018-06-01 15:07:00.877                         NZOR Condensed

The full name gives the ICZN-style de Winton, 1898 form but the authorship is simply de Winton this leads to an error in output.

charvolant commented 1 year ago

Preprocessing the NZOR dataset suffers from the NZOR scientificName being the name + authorship combination and the scientificNameAuthorship field containing only the primary authors name. For example,

scientificName (NZOR) scientificNameAuthorship (NZOR) namePublishedInYear scientificName (ALA) scientificNameAuthorship (ALA)
Weissia microcarpa Hook.f. & Wilson Hook.f. & Wilson 1859 Weissia microcarpa Hook.f. & Wilson
Mniodendron comosum (Labill.) Lindb. var. comosum Labill. Mniodendron comosum var. comosum (Labill.) Lindb.
Enicmus puncticeps (Broun, 1886) Broun 1886 Enicmus puncticeps (Broun, 1886)

Attempt to split the scientific name into name, full authorship and complete name