NeotomaDB / DwC-Mapping

A document explaining how we will map Neotoma against the DarwinCore Schema
MIT License
0 stars 0 forks source link

Managing `identificationQualifiers`. #8

Open SimonGoring opened 7 years ago

SimonGoring commented 7 years ago

In Issue #7 there is a comment:

The scientificName field contains the identificationQualifier even though you were able to break out the qualifier into its own field. The scientificName field should really contain only the most specifically unequivocally determined scientificName (not the identification information). If you can keep those separate, that is best, otherwise the migrator needs to do that.

The DarwinCore terms describe the identificationQualifier as:

1) For the determination "Quercus aff. agrifolia var. oxyadenia", identificationQualifier would be "aff. agrifolia var. oxyadenia" with accompanying values "Quercus" in genus, "agrifolia" in specificEpithet, "oxyadenia" in infraspecificEpithet, and "var." in rank. 2) For the determination "Quercus agrifolia cf. var. oxyadenia", identificationQualifier would be "cf. var. oxyadenia " with accompanying values "Quercus" in genus, "agrifolia" in specificEpithet, "oxyadenia" in infraspecificEpithet, and "var." in rank.

For cf., there are 3000+ records in Neotoma using the cf. modifier, 398 using undiff., 1726 using -type (or variants), 439 using the ? in some place within the identifier. This means approximately 10% of the 22482 taxon rows have some sort of identification qualifier.

The challenge is that the application of the proposed solution above is difficult. In general we can string-split on the cf.: For example: Acer cf. A. pensylvanicum becomes `{"genus":"Acer", "identificationQualifier": "cf. A. pensylvanicum", "specificEpithet":"pensylvanicum"}

This requires the split, but then also identifying the epithet part, and knowing that Acer is a genus.

This problem is illustrated with Chenopodiaceae/Amaranthaceae cf. Atriplex. Here we need to know that this is a family (or two families) and that Atriplex is the genus.

A partial list of variants:

cf. Bocconia Cupressaceae cf. Juniperus communis Acanthaceae (type 1) cf. Acanthus Bembidion (Plataphus) cf. B. sulcipenne Agonum (Stictanchus) cf. A. bicolor cf. Armeria (type A) Achnanthes cf. oestrupii var. pungens Actinocyclus cf. normanii NJSS DME Amphora cf. micrometra FRITZ Agrypnia cf. A. vestita/Fabria sp. Artemisia cf. A. norvegica (type2) Asio cf. A. flammeus/A. otus Asteraceae subf. Asteroideae cf. Tussilago Bembidion cf. B. timidum/ B. versicolor Bison bison cf. B. b. antiquus Camelidae cf. Camelops sp. Carabidae cf. Blethisa cf. B.julii cf. Anas platyrhynchos/A. rubripes cf. Canis latrans irvingtonensis cf. Heliotropium/Lafoensia cf. Myrsine (tricolpate) cf. Ondatra zibethicus /meadensis cf. Primulaceae subf. Myrsinoideae Chenopodiaceae/Amaranthaceae cf. Atriplex Chrysolina sp.cf. Chrysolina marginata