Open Australis86 opened 2 years ago
Hybrid taxa are represented in 2 different ways in ChecklistBank. If they are hybrid formulas we do not parse them and instead keep the entire formula in the scientificName and indicate this by setting Name.type=HYBRID.
In case of named hybrids we do parse them, but do not want to have specific ranks for notho taxa. Instead we use the additional notho field to indicate them. and have a special field notho
that specifies which part of the name is to hold the hybrid marker, i.e. at which rank the hybridisation occurred. In your example this is notho=species
:
https://api.catalogueoflife.org/dataset/3LR/taxon/84JPQ
"scientificName":"Amaranthus × ozanonii",
"authorship":"Priszter",
"rank":"species",
"genus":"Amaranthus",
"specificEpithet":" ozanonii",
"notho":"specific",
"combinationAuthorship":{
"authors":["Priszter"]
},
"code":"botanical",
"origin":"source",
"type":"scientific",
"remarks":"Hybrid taxon. Binomial generated by Species 2000.",
"parsed":true
Can you elaborate about the search problem? As we have properly parsed names the hybrid marker is not really in the way: https://www.checklistbank.org/dataset/1141/names?q=Amaranthus%20ozanonii
As for the remarks I don't know how these are generated. We simply copy all remarks from the source to the COL Checklist. But the remark is already like this in the source dataset in ChecklistBank. @gdower I assume you generate those comments in your ColDP script?
Thanks for the quick response. That is not quite the structure of the data I am seeing in the DwCA exports (e.g. from https://api.checklistbank.org/dataset/[key]/export
). For example, this is the row for ozanonii from the Taxon.tsv file produced for Amaranthus:
dwc:taxonID 84JPQ
dwc:parentNameUsageID T8M
dwc:acceptedNameUsageID
dwc:originalNameUsageID
dwc:scientificNameID 1e7e47fd-4350-4a8a-8cf6-4438037218a7
dwc:datasetID 1141
dwc:taxonomicStatus accepted
dwc:taxonRank species
dwc:scientificName Amaranthus × ozanonii Priszter
dwc:scientificNameAuthorship Priszter
dwc:genericName Amaranthus
dwc:infragenericEpithet
dwc:specificEpithet ozanonii
dwc:infraspecificEpithet
dwc:cultivarEpithet
dwc:nameAccordingTo
dwc:namePublishedIn
dwc:nomenclaturalCode ICN
dwc:nomenclaturalStatus
dwc:taxonRemarks
dcterms:references http://www.worldplants.de/?deeplink=Amaranthus-ozanonii
So the only indication in the DwCA dataset that this is a hybrid taxon is in the scientificName field (where it has two spaces after the symbol). This is inconsistent with other data sources such as the KEW WCSP (which provides the Orchid family amongst others), which doesn't have the hybrid symbol in the name but does indicate hybrid taxa by the taxonRemarks column.
If I then use the nameusage API (https://api.checklistbank.org/dataset/3LR/nameusage/search
), the only way to get this nothospecies is to either set type to "WHOLE WORDS" (which then introduces at least one of its subspecies) or include the hybrid symbol in the search set type to 'EXACT'. Normally I send the following to the API:
params = {'q':search_term, 'content': 'SCIENTIFIC_NAME', 'maxRank':'SPECIES', 'type': 'EXACT', 'offset':0, 'limit':10}
r = self._session.get("https://api.checklistbank.org/dataset/3LR/nameusage/search", params=params, headers={'accept': 'application/json'})
On the other hand, if I were to do this with any of the orchid nothospecies, the above would work flawlessly as the hybrid symbol is not included (e.g. Cymbidium baoshanense).
Essentially I'd just like there to be a consistent way to identify nothospecies in the DwCA exports, rather than need to accomodate different formats depending on the upstream datasource.
Let me know if you need more information.
DwC is not an ideal format for taxonomic data. There has been an issue for dealing with notho taxa that has been closed because of lack of demand: https://github.com/tdwg/dwc/issues/43
This would be solved when using the ColDP archive which aligns better with our API. I guess we can add an unofficial col:notho term to the dwc export until there is an official dwc term. Would that help?
The Kew species C. baoshanense) does only mention the hybrid nature in the remarks. That is completely opaque to use so we treat it just as a regular species. This data is rather old and will be replaced by a newer version from Kew soon (@yroskov ?). In the newer WCVP and WCSP datasets the hybrid is correctly marked up:
See also https://github.com/CatalogueOfLife/general/blob/master/docs/NAMES.md#named-hybrid I just realize there also is no notho term in ColDP. We anticipated that the hybrid symbol is prefixing the respective part of the name. I would be in favor of adding a new col:notho term like we have in the API.
Thanks Markus. I have no experience with the ColDP format, but quickly looking at it suggests it may be far more useful to me going forward (alas, I will need to overhaul a chunk of code, but it looks like it will be worth it in the long run). All my scripts were built around the DwC format years ago and I wasn't aware of the development of the ColDP, so I appreciate you bringing it to my attention.
I am looking forward to the KEW WCSP dataset being updated and I appreciate the heads up as to which way this is going to go (the inclusion of the hybrid symbol), as I will need to update my scripts to accommodate this. I am glad that the outcome will be more consistent.
The inclusion of a notho field in the DwC (as an interim measure until I can rewrite to use ColDP) and ColDP formats would be greatly appreciated (and preferable over including the symbol in the corresponding epithet field), as I often use the individual fields with the database I work with rather than parsing the scientificName (the system I work with does not accommodate the author, only the botanical Latin name).
If it is straightforward to make the namesearch API insensitive to the presence of the hybrid symbol, that would be much appreciated.
for the namesearch I dont think I can make that happen easily. But I will be working on a new matching method in the API that would be the recommended way to look up a single matching name in any dataset. It will be based on the NamesIndex and should allow for small variations like the hybrid marker and epithet gender.
@gdower I assume you generate those comments in your ColDP script?
Yes, we add that comment.
@gdower I assume you generate those comments in your ColDP script?
Yes, we add that comment.
Does that mean the binomial does not exist in World Plants? It appears just like that here: https://www.worldplants.de/world-plants-complete-list/complete-plant-list/?name=Amaranthus-ozanonii
Hybrid taxon. Binomial generated by Species 2000.
Maybe we should replace Species 2000 with Catalogue of Life?
for the namesearch I dont think I can make that happen easily. But I will be working on a new matching method in the API that would be the recommended way to look up a single matching name in any dataset. It will be based on the NamesIndex and should allow for small variations like the hybrid marker and epithet gender.
Thanks Markus. I look forward to that becoming available.
Maybe we should replace Species 2000 with Catalogue of Life?
I'll remove the binomial comment. It made sense in the old web interface that couldn't render the ×
symbol for hybrids, although maybe I should keep the Hybrid taxon
remark.
@mdoering Thanks Markus. I'll use those samples to do some testing.
Describe the problem: The Darwin Core Archive exports for the Amaranthus genus (and likewise the COL webpages) do not indicate that some of the taxa are hybrid taxa except by the hybrid symbol in the scientificName. The taxonRank is given as species, rather than nothospecies, and the taxonRemarks field (which usually includes a note to say "Hybrid taxon" and the parentage, if available) are empty. Looking at the upstream datasource, the parentage is given for at least one of the hybrid taxa (Amaranthus ozanonii).
The inclusion of the hybrid symbol makes searching for these entries via the API difficult, as most other genera (e.g. Cymbidium from KEW WCSP) do not include this. I'd much prefer if the representation of natural hybrids was kept consistent by excluding the use of the hybrid symbol, correctly setting the rank and including the parentage where available in the taxon remarks. This makes it much easier to work with the dataset and the API.
Link to effected CoL webpages: https://www.catalogueoflife.org/data/taxon/84JPQ