The scientificName field contains the identificationQualifier even though you were able to break out the qualifier into its own field. The scientificName field should really contain only the most specifically unequivocally determined scientificName (not the identification information). If you can keep those separate, that is best, otherwise the migrator needs to do that.
1) For the determination "Quercus aff. agrifolia var. oxyadenia", identificationQualifier would be "aff. agrifolia var. oxyadenia" with accompanying values "Quercus" in genus, "agrifolia" in specificEpithet, "oxyadenia" in infraspecificEpithet, and "var." in rank. 2) For the determination "Quercus agrifolia cf. var. oxyadenia", identificationQualifier would be "cf. var. oxyadenia " with accompanying values "Quercus" in genus, "agrifolia" in specificEpithet, "oxyadenia" in infraspecificEpithet, and "var." in rank.
For cf., there are 3000+ records in Neotoma using the cf. modifier, 398 using undiff., 1726 using -type (or variants), 439 using the ? in some place within the identifier. This means approximately 10% of the 22482 taxon rows have some sort of identification qualifier.
The challenge is that the application of the proposed solution above is difficult. In general we can string-split on the cf.: For example: Acer cf. A. pensylvanicum becomes `{"genus":"Acer", "identificationQualifier": "cf. A. pensylvanicum", "specificEpithet":"pensylvanicum"}
This requires the split, but then also identifying the epithet part, and knowing that Acer is a genus.
This problem is illustrated with Chenopodiaceae/Amaranthaceae cf. Atriplex. Here we need to know that this is a family (or two families) and that Atriplex is the genus.
A partial list of variants:
cf. Bocconia
Cupressaceae cf. Juniperus communis
Acanthaceae (type 1) cf. Acanthus
Bembidion (Plataphus) cf. B. sulcipenne
Agonum (Stictanchus) cf. A. bicolor
cf. Armeria (type A)
Achnanthes cf. oestrupii var. pungens
Actinocyclus cf. normanii NJSS DME
Amphora cf. micrometra FRITZ
Agrypnia cf. A. vestita/Fabria sp.
Artemisia cf. A. norvegica (type2)
Asio cf. A. flammeus/A. otus
Asteraceae subf. Asteroideae cf. Tussilago
Bembidion cf. B. timidum/ B. versicolor
Bison bison cf. B. b. antiquus
Camelidae cf. Camelops sp.
Carabidae cf. Blethisa cf. B.julii
cf. Anas platyrhynchos/A. rubripes
cf. Canis latrans irvingtonensis
cf. Heliotropium/Lafoensia
cf. Myrsine (tricolpate)
cf. Ondatra zibethicus /meadensis
cf. Primulaceae subf. Myrsinoideae
Chenopodiaceae/Amaranthaceae cf. Atriplex
Chrysolina sp.cf. Chrysolina marginata
In Issue #7 there is a comment:
The DarwinCore terms describe the
identificationQualifier
as:For
cf.
, there are 3000+ records in Neotoma using the cf. modifier, 398 using undiff., 1726 using -type (or variants), 439 using the ? in some place within the identifier. This means approximately 10% of the 22482 taxon rows have some sort of identification qualifier.The challenge is that the application of the proposed solution above is difficult. In general we can string-split on the
cf.
: For example: Acer cf. A. pensylvanicum becomes `{"genus":"Acer", "identificationQualifier": "cf. A. pensylvanicum", "specificEpithet":"pensylvanicum"}This requires the split, but then also identifying the epithet part, and knowing that Acer is a genus.
This problem is illustrated with Chenopodiaceae/Amaranthaceae cf. Atriplex. Here we need to know that this is a family (or two families) and that Atriplex is the genus.
A partial list of variants:
cf. Bocconia Cupressaceae cf. Juniperus communis Acanthaceae (type 1) cf. Acanthus Bembidion (Plataphus) cf. B. sulcipenne Agonum (Stictanchus) cf. A. bicolor cf. Armeria (type A) Achnanthes cf. oestrupii var. pungens Actinocyclus cf. normanii NJSS DME Amphora cf. micrometra FRITZ Agrypnia cf. A. vestita/Fabria sp. Artemisia cf. A. norvegica (type2) Asio cf. A. flammeus/A. otus Asteraceae subf. Asteroideae cf. Tussilago Bembidion cf. B. timidum/ B. versicolor Bison bison cf. B. b. antiquus Camelidae cf. Camelops sp. Carabidae cf. Blethisa cf. B.julii cf. Anas platyrhynchos/A. rubripes cf. Canis latrans irvingtonensis cf. Heliotropium/Lafoensia cf. Myrsine (tricolpate) cf. Ondatra zibethicus /meadensis cf. Primulaceae subf. Myrsinoideae Chenopodiaceae/Amaranthaceae cf. Atriplex Chrysolina sp.cf. Chrysolina marginata