I was hoping to start a discussion on what to do in datasets with hybrids that are not currently recognized within catalogues. It seems there are some well known groups that hybridize with a wealth of literature behind them; however, their scientific names are not structured in a way that allows for resolution at large scale. Hybrids are of particular interest in biogeography, as they provide means for testing questions about interaction intermediates, ecological viability, distribution compared to parent populations, etc so it would be a rather impactful loss when resolving taxonomy.
The issue first appears in parsing, where the format of the name can be either:
genus + specificEpithet × genus + specificEpithet OR genus + specificEpithet × specificEpithet
While gn-parser seems to be capable of handling this type of name, gbif-parse seems to fail. I'll show the first name example, but the other variation yields the same results.
echo -e "\tTragopogon dubius × Tragopogon porrifolius" | nomer replace gbif-parse
java.lang.RuntimeException: failed to apply taxon
I've checked this output in other formats using gbif-parsing tool page & R, it appears that names of this structure will fail to parse. GBIF appears to take a conservative approach to this issue, processing the name to a higher taxonomic rank (genus: Tragopogon). Ex here: https://www.gbif.org/occurrence/2573489546
Even when parsing correctly occurs it appears to be unlikely for a catalogue to have a hybrid as a registered name outside of cultivars.
echo -e "\tTragopogon dubius × Tragopogon porrifolius" | nomer replace gn-parse | nomer append wfo
Tragopogon dubius × Tragopogon porrifolius NONE Tragopogon dubius × Tragopogon porrifolius
While not surprising, as species concepts get very muddled here, this means that the wfo catalogue does not have the capability to resolve these types of names. There are some instances of hybrids registered in the WFO catalogue, however they don't appear to be standardized with how name-alignment usually proceeds at least to me. See ex: http://www.worldfloraonline.org/taxon/wfo-4000042576
This is further complicated by not all groups having those registered hybrids in their taxonomy, as is the case for Tragopogon.
I am wondering if a way to get around this would be to break the name into two pieces (Genus + specificEpithet) (Genus + specificEpithet) and resolve each individually. If both compose valid scientificNames, we could create a "confident" hybrid? This isn't ideal, but it would allow for finer grain resolution on names of this nature. Just my thoughts, curious to hear if anyone has encountered this issue and has any ideas on resolving names of this composition.
I was hoping to start a discussion on what to do in datasets with hybrids that are not currently recognized within catalogues. It seems there are some well known groups that hybridize with a wealth of literature behind them; however, their scientific names are not structured in a way that allows for resolution at large scale. Hybrids are of particular interest in biogeography, as they provide means for testing questions about interaction intermediates, ecological viability, distribution compared to parent populations, etc so it would be a rather impactful loss when resolving taxonomy.
The issue first appears in parsing, where the format of the name can be either:
genus + specificEpithet × genus + specificEpithet OR genus + specificEpithet × specificEpithet While gn-parser seems to be capable of handling this type of name, gbif-parse seems to fail. I'll show the first name example, but the other variation yields the same results.
echo -e "\tTragopogon dubius × Tragopogon porrifolius" | nomer replace gn-parse
Tragopogon dubius × Tragopogon porrifoliusecho -e "\tTragopogon dubius × Tragopogon porrifolius" | nomer replace gbif-parse
java.lang.RuntimeException: failed to apply taxonI've checked this output in other formats using gbif-parsing tool page & R, it appears that names of this structure will fail to parse. GBIF appears to take a conservative approach to this issue, processing the name to a higher taxonomic rank (genus: Tragopogon). Ex here: https://www.gbif.org/occurrence/2573489546
Even when parsing correctly occurs it appears to be unlikely for a catalogue to have a hybrid as a registered name outside of cultivars.
echo -e "\tTragopogon dubius × Tragopogon porrifolius" | nomer replace gn-parse | nomer append wfo
Tragopogon dubius × Tragopogon porrifolius NONE Tragopogon dubius × Tragopogon porrifoliusWhile not surprising, as species concepts get very muddled here, this means that the wfo catalogue does not have the capability to resolve these types of names. There are some instances of hybrids registered in the WFO catalogue, however they don't appear to be standardized with how name-alignment usually proceeds at least to me. See ex: http://www.worldfloraonline.org/taxon/wfo-4000042576 This is further complicated by not all groups having those registered hybrids in their taxonomy, as is the case for Tragopogon.
I am wondering if a way to get around this would be to break the name into two pieces (Genus + specificEpithet) (Genus + specificEpithet) and resolve each individually. If both compose valid scientificNames, we could create a "confident" hybrid? This isn't ideal, but it would allow for finer grain resolution on names of this nature. Just my thoughts, curious to hear if anyone has encountered this issue and has any ideas on resolving names of this composition.