Closed bart-v closed 1 year ago
I agree that we should have a "scientific name and identifier mismatch". In case of conflict, it would probably make sense to privilege the scientific name over the ID because it would make it more transparent to users. A scientific name is human readable and doesn't require checking an external source.
I would argue the other way around and prefer the identifier, as this is less of an interpretation from the GBIF side and entirely in the hands of the publisher.
@MattBlissett , please look at https://www.gbif.org/occurrence/4399021367 (from a dataset I happened to be checking today).
The publisher gives "Sterna anaethetus" and ID "urn:lsid:marinespecies.org:taxname:212605". The accepted name is "Onychoprion anaethetus" with ID "urn:lsid:marinespecies.org:taxname:567792". GBIF has interpreted (added) the genus as "Onychoprion" and interpreted (added) the taxonomicStatus as "Synonym", but has not changed either the original scientificName or the original scientificNameID.
(1) How would this record look if GBIF preferenced ID over name? (2) What would happen if the publisher had "corrected" the ID but not the name, i.e. gave "Sterna anaethetus" and "urn:lsid:marinespecies.org:taxname:567792"?
Thank you all for guiding this work.
are you also proposing that GBIF track all the scientificNameID sources and compare the one selected for a record with the name, or just WoRMS APHIA IDs?
Not quite. I suggest we use configuration to enable certain patterns e.g. urn:lsid:marinespecies.org:*
mapped to WoRMS. I think it would be wise to do an impact assessment like we did above, before enabling others. I anticipate IPNI, DynTaxa, IndexFungorum would be good candidates though.
what will happen to the original scientificName and scientificNameID entries in a record where GBIF detects a conflict? Will both continue to appear in the interpreted record?
Like @mdoering I think I would lean towards the values found using the scientificNameID
since it's less ambiguous and puts the responsibility and control on the publisher - especially if we provide sensible flags that make it easy to detect and fix. The verbatim values are always available of course.
Taking into consideration everyone's comments, I think we would need the following flags to be transparent and to make it easy to locate problematic records (edited to accommodate suggestions from @MattBlissett and @ymgan below).
Issue | Description |
---|---|
SCIENTIFIC_NAME_ID_IGNORED |
The scientificNameID uses a pattern that is not configured in GBIF. The backbone lookup was performed using the names on the record and the scientificNameID is nullified in the interpreted record. |
TAXON_CONCEPT_ID_IGNORED |
The taxonConceptID uses a pattern that is not configured in GBIF. The backbone lookup was performed using the names on the record and the taxonConceptID is nullified in the interpreted record.. |
SCIENTIFIC_NAME_ID_NOT_FOUND |
The scientificNameID matched a known pattern, but it was not found in the associated checklist. The backbone lookup was performed using the names on the record ignoring the ID and the scientificNameID is nullified in the interpreted record. This may indicate a poorly formatted identifier or may be caused by a newly created ID that isn't yet known in the version of the published checklist. |
TAXON_CONCEPT_ID_NOT_FOUND |
The taxonConceptID matched a known pattern, but it was not found in the associated checklist. The backbone lookup was performed using the names on the record ignoring the ID and the taxonConceptID is nullified in the interpreted record. This may indicate a poorly formatted identifier or may be caused by a newly created ID that isn't yet known in the version of the published checklist. |
SCIENTIFIC_NAME_AND_ID_INCONSISTENT |
The scientificName provided in the occurrence record does not precisely match the name in the registered checklist when using the scientificNameID or taxonConceptID to look it up. Publishers are advised to check the ID is correct, or update the formatting of the names on their records. |
TAXON_MATCH_NAME_AND_ID_AMBIGUOUS |
The GBIF Backbone concept was found using the scientificNameID or taxonConceptID but it differs from what would have been found if the classification names on the record were used. This may indicate a gap in the GBIF backbone, a poor mapping between the checklist and the backbone, or a mismatch between the classification names and the declared IDs (scientificNameID or taxonConceptID ) on the occurrence record itself. |
Please keep the feedback coming - especially if you disagree. Thank you.
@timrobertson100, so the answer to my first question is "No different"?
We would lookup the WoRMS ID 212605 in the checklist published to GBIF: https://www.gbif.org/species/155305680/verbatim
This checklist (like all others) is matched to the backbone and we can take that mapping to find the matching backbone entry for that name, which would be the nubKey
property in this API response: https://api.gbif.org/v1/species/155305680
In that case we miss that matching for unknown reasons. I will investigate, because the backbone does have the name Sterna anaethetus Scopoli, 1786 listed sourced from ITIS and regarded as a synonym of Onychoprion anaethetus subsp. anaethetus (Scopoli, 1786).
The point is that we will only link via name and use the name as it is classified in the backbone, not in the source.
If the publisher would move the identifier to the accepted name id, the occurrence would be listed as Onychoprion anaethetus instead and if the given name would remain as Sterna anaethetus, a mismatch flag would be risen.
So regardless of the occurrence identification is given as name strings or identifiers, we must still do a matching to our backbone. The only exception would be if the name/taxon identifier was given as a backbone key directly, then we would not have to do any matching at all. We do regard the checklist matching a little more safe than plain name matching though.
The point is that we will only link via name and use the name as it is classified in the backbone, not in the source.
Or in other words, the "Interpreted" name is the name according to the GBIF backbone, whether we found it by matching strings or looking up an identifier.
Bob's other point:
GBIF has interpreted (added) the genus as "Onychoprion" and interpreted (added) the taxonomicStatus as "Synonym", but has not changed either the original scientificName or the original scientificNameID.
Should we be blanking the interpreted scientificNameID in the situations where it is not used? i.e. for SCIENTIFIC_NAME_ID_IGNORED
and SCIENTIFIC_NAME_ID_NOT_FOUND
? It remains, of course, in the "Original"/verbatim data.
Edited to add (Tim Robertson): These suggestions have been included in the issue descriptions above
@timrobertson100 and @mdoering , leaving aside the interesting question of why WoRMS should get priority in this proposed change, please note that the OBIS manual (https://manual.obis.org/darwin_core.html#taxonomy-and-identification) says
"scientificName (required term) should always contain the originally recorded scientific name, even if the name is currently a synomym [sic]. This is necessary to be able to track back records to the original dataset... A WoRMS LSID should be added in scientificNameID (required term), OBIS will use this identifier to pull the taxonomic information from the World Register of Marine Species (WoRMS) into OBIS, such as the taxonomic classification and the accepted name in case of invalid names or synonyms."
I read that as saying that OBIS contributors can use the original scientificName and the accepted scientificNameID, which guarantees mismatches. Maybe @albenson-usgs could comment?
I read that as saying that OBIS contributors can use the original scientificName and the accepted scientificNameID, which guarantees mismatches. Maybe @albenson-usgs could comment?
I rather read that as stick with the original name, even if it is considered a synonym now, also for the identifier:
OBIS will use this identifier to pull the taxonomic information from WoRMS into OBIS, such as the accepted name in case of invalid names or synonyms."
@mdoering, OK, so an OBIS user enters the original name and the LSID for that original name, and if it's a synonym, then OBIS replaces the original name and ID with the accepted name and ID? Which of these goes into the OBIS records shared with GBIF - the original name + its ID, or the accepted name +its ID?
I read that as saying that OBIS contributors can use the original scientificName and the accepted scientificNameID, which guarantees mismatches.
Thank you Bob! OBIS nodes use the scientificNameID of the original scientificName if the name provided in the record is a synonym. For example:
scientificName: Diogenichthye scientificNameID: urn:lsid:marinespecies.org:taxname:397410
We are specifically asked to not change the scientificName to its accepted name or/and uses the lsid of the accepted name as shown below:
scientificName: Diogenichthys scientificNameID: urn:lsid:marinespecies.org:taxname:125820
@mdoering, does the GBIF backbone perfectly follow the checklists (allowing for update delays) in all cases, so that matching is done with names and classifications in checklists, and the checklists amount to subsets of the backbone?
@ymgan, which of these goes into the OBIS records shared with GBIF - the original name + its ID, or the accepted name +its ID?
which of these goes into the OBIS records shared with GBIF - the original name + its ID, or the accepted name +its ID?
The former goes to GBIF and OBIS: original name + its ID
scientificName: Diogenichthye scientificNameID: urn:lsid:marinespecies.org:taxname:397410
@mdoering, does the GBIF backbone perfectly follow the checklists (allowing for update delays) in all cases, so that matching is done with names and classifications in checklists, and the checklists amount to subsets of the backbone?
The backbone is currently only updated twice per year using most of the important lists, but not everything there is out there. WoRMS, ITIS, IPNI & ZooBank are included for example. That means there clearly are names not included in the backbone, especially all newly described ones.
The lists that are candidate for name identifiers are updated in various frequencies defined by their publishers. Whenever a new version of a list is imported we match it to the backbone during the import process.
Thank you, @ymgan. Does OBIS share with GBIF like this?
scientificName = original name scientificNameID = original name's ID acceptedNameUsage = accepted name acceptedNameUsageID = accepted name's ID
That seems very clear to me. I'm finding the proposed changes to GBIF's interpretation protocol confusing, unless GBIF would do
scientificName = interpreted name scientificNameID = interpreted name's ID verbatimScientificName (not DwC) = original name verbatimScientificNameID (also not DwC) = original name's ID
Thank you, @mdoering.
Thank you @Mesibov for these interesting questions!
Does OBIS share with GBIF like this?
scientificName = original name scientificNameID = original name's ID acceptedNameUsage = accepted name acceptedNameUsageID = accepted name's ID
According to my understanding, no - because the accepted name and accepted name's ID may become unaccepted in the future. So we are trained/asked to provide scientificName, scientificNameID along with scientificNameAuthorship, kingdom, taxonRank, taxonRemarks of the original name.
Thank you, @ymgan. So the looking-up of the original name's ID in WoRMS is done twice, independently. OBIS does it (according to the Manual) for its own purposes but doesn't share the result. GBIF does it and may change what it receives from OBIS (in the case of unaccepted names) but does not make clear what's happened (see the "Sterna anaethetus" example, above).
and may change what it receives from OBIS
It may just be the phrasing, but there may be a misunderstanding to clarify. A data publisher publishes a dataset, which is registered in the GBIF registry and linked to the OBIS network. Both GBIF and OBIS ingest the same dataset from the source (generally a GBIF IPT) and process it for the services each infrastructure provides.
@timrobertson100, thank you for that clarification. I misinterpreted the source of the record I cited above. The publisher (https://www.gbif.org/publisher/5fa89f68-9af0-4a0d-8998-ea39695c1db9) is "CSIRO NCMI IDC / OBIS Australia", so I assumed that "OBIS Australia", a node in the OBIS network, was co-publisher.
OBIS provides useful information on the fields it provides for its own downloads: https://obis.org/data/access/.
scientificName is derived from the provided scientificNameID, but there is also an originalScientificName field for the name as provided. There is an APHIA ID field with "the valid name based on the scientificNameID or derived by matching the provided scientificName with WoRMS".
I'm not sure I follow the bit after "or". I think it might apply in cases where (contrary to the requirement) only a name was provided and not an ID. There is no originalScientificNameID field.
OBIS also checks name and ID fields for quality: https://github.com/iobis/obis-qc. OBIS checks
So yes, the same tests would be done twice, independently, for datasets published through OBIS and through GBIF, if GBIF goes ahead with preferencing ID to name in its lookups.
I think the proposed flags https://github.com/gbif/pipelines/issues/217#issuecomment-1696930580 are sensible. I can see the point of prioritising the IDs for interpretation (in the cases where they don't match the name). If we get too many confused users, we should consider changing the behaviour or rolling back on the changes.
Thank you so much!! I agree with @ManonGros
I think the proposed flags https://github.com/gbif/pipelines/issues/217#issuecomment-1696930580 are sensible. Maybe it would come clear when we see how these flags apply to the examples encountered.
Just throwing out idea, I am wondering if it make sense for the flags to follow the vocab from BDQ TG2 https://github.com/tdwg/bdq/issues/152#issue-354943638 ? For example:
SCIENTIFIC_NAME_AND_ID_INCONSISTENT
TAXON_MATCH_NAME_AND_ID_AMBIGUOUS
Do these capture the same meaning?
On the other hand, if there are things worth to be mentioned in OBIS manual that could help to prevent certain issues identified in the test run (for future data), it will be great if someone could create an issue at https://github.com/iobis/manual (I hope this doesn't side track the conversation)
Thank you so much again!
Thanks @ymgan - That makes good sense. I'll adjust the labels above accordingly.
Morning all! Just catching up 😊
Like @mdoering I think I would lean towards the values found using the scientificNameID since it's less ambiguous and puts the responsibility and control on the publisher - especially if we provide sensible flags that make it easy to detect and fix. The verbatim values are always available of course.
Yes I agree with this completely. It sounds like that is the conclusion that was reached but wanted to voice my support. It is not up to the aggregators to correct issues that might arise but instead to flag them for publishers to address.
I also agree that the proposed flags https://github.com/gbif/pipelines/issues/217#issuecomment-1696930580 are sensible. I don't have any changes to suggest or ones to add at this time.
I do think it's worth considering that while what Ming states is the current best practice ("OBIS nodes use the scientificNameID of the original scientificName if the name provided in the record is a synonym") what Bob provided was the original instruction ("scientificName (required term) should always contain the originally recorded scientific name, even if the name is currently a synomym [sic]. This is necessary to be able to track back records to the original dataset... A WoRMS LSID should be added in scientificNameID (required term)") and therefore there will be datasets following that practice. Note that verbatimIdentification
is a relatively new term in Darwin Core and so the process Bob mentioned was the only way to keep the original name as it was provided when datasets were matched to WoRMS during OBIS processing. Now we have improved options but there will be datasets that had to follow that original instruction and it may not be possible to update them easily. I think the SCIENTIFIC_NAME_AND_ID_INCONSISTENT
flag will be appropriately applied so I think there is nothing we need to change for this. I just want us to be aware.
Going back to this issue (https://github.com/gbif/pipelines/issues/217#issuecomment-1680684346) Tim identified at the beginning, I would understand this would get a SCIENTIFIC_NAME_AND_ID_INCONSISTENT
flag. I think what might be difficult for nodes, and I'm not sure what we could do to help them, is that flag won't help them know that the fuzzy match has led to an unexpected result. Perhaps this is where the data providers must come in to check these.
Finally I do make use of verbatimIdentification
so I wouldn't advise that being used as Bob has described in https://github.com/gbif/pipelines/issues/217#issuecomment-1697021413. As an example here is a species lookup table I've been working with recently where I needed to make use of verbatimIdentification
.
sciname_crosswalk.csv
@albenson-usgs, many thanks for your comments.
Please note that verbatimIdentification is not the same as "verbatimScientificName". vI (https://dwc.tdwg.org/terms/#dwc:verbatimIdentification) is a place for informal names, guesses, vernacular names etc as well as formal scientific names and "is meant to be used in addition to dwc:scientificName (and dwc:identificationQualifier etc.), not instead of it".
vI is a handy field for data checkers because it allows us to say to compilers: "coral sp. 1a is not appropriate for scientificName. Please put coral sp. 1a in verbatimIdentification and put a formal scientific name for the coral taxon in scientificName".
A record could therefore have
I suggest that to avoid confusion, all 5 fields should be present in the record made available to data end-users.
Thank you all for contributing so actively to this thread.
I'll now implement those flags, prepare a configuration that handles the WoRMS LSIDs, and process all datasets using them. I am sure we'll refine this again, but getting GBIF and OBIS more closely aligned should be a good start.
I have a remaining question - How strictly should the scientificName
comparison be when flagging differences?
Consider e.g. a record with Aus bus
in the name, and an LSID that returns Aus bus L. 1771
. Should I flag anything not exactly the same, or should I parse both sides and compare the resulting canonical? I'm tempted to suggest the latter as a start to avoid too many nuisance flags, perhaps making it stricter in the future. Thoughts?
@timrobertson100, please report back here or in the GBIF forum/data blog with the flag tallies you find for records with WoRMS IDs.
DwC scientificName's recommendation is "with authorship and date information if known". Authorship and authorship/date are missing in a large proportion of datasets I see or are put in scientificNameAuthorship. Please parse to canonical. I'm assuming your parser will deal properly with "subsp/subsp./ssp/ssp...." etc.
Thanks, @Mesibov - that was my intuition too.
@mdoering, don't you think that comparing interpreted values will risk piling one error on top of another? Interpreted values are sometimes wrong, and in any case are not the responsibility of the data publisher.
Sorry, I have been away for a while.
So the looking-up of the original name's ID in WoRMS is done twice, independently. OBIS does it (according to the Manual) for its own purposes but doesn't share the result.
The results are shared. OBIS currently performs lookup of the provided scientificNameID
in WoRMS (usually a WoRMS LSID but can also be a BOLD or NCBI ID), or matches the scientific name using the WoRMS API in case the ID is missing or invalid. We then replace the full taxonomy with the taxonomy of the accepted name. So the user sees:
AphiaID
: interpreted WoRMS ID for the accepted namescientificName
and higher ranks: accepted names according to WoRMSoriginalScientificName
: scientificName
as providedscientificNameID
: scientificNameID
as providedFor consistency we should probably rename AphiaID
to scientificNameID
and scientificNameID
to originalScientificNameID
(or some other form indicating verbatim value).
I read that as saying that OBIS contributors can use the original scientificName and the accepted scientificNameID, which guarantees mismatches.
No, if the name is a synonym we recommend providing the ID for the synonym.
Consider e.g. a record with Aus bus in the name, and an LSID that returns Aus bus L. 1771. Should I flag anything not exactly the same, or should I parse both sides and compare the resulting canonical? I'm tempted to suggest the latter as a start to avoid too many nuisance flags, perhaps making it stricter in the future. Thoughts?
Yes, please use the canonical. The OBIS recommendation is still to not provided the authorship (despite the DwC definition).
https://github.com/gbif/pipelines/issues/217#issuecomment-1696930580
The flags make sense, I'll see if we can implement them as well.
@pieterprovoost, many thanks for participating, and I apologise for my use of "sharing". At the time I thought OBIS shared its processing results with GBIF. It doesn't. Publishers independently publish to OBIS and to GBIF, so the processing of scientificNameID happens in parallel. This comment deals with that.
@timrobertson100 generated a file that has taxonomic details for records with scientificNameID populated with WoRMS LSIDs. These are the ca 19000 cases in which there would be a disagreement in names if GBIF looked up the scientificNameID in WoRMS and compared it to the provided (original) scientificName.
The file contains numerous instances in which well-formed, formal taxonomic names have more than one scientificNameID. Here's one example. It's "Coscinoderma matthewsi", a misspelling of Coscinoderma mathewsi (Lendenfeld, 1886), urn:lsid:marinespecies.org:taxname:165090.
No. of records | scientificName | scientificNameID | WoRMS name 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165091 | Coscinoderma pesleonis (accepted) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165092 | Coscinoderma sinuosum (unaccepted synonym of Coscinoderma lamarcki) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165093 | Hippospongia ammata (accepted) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165094 | Hippospongia anfractuosa (accepted) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165095 | Hippospongia canaliculata (accepted) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165096 | Hippospongia cerebrum (accepted) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165097 | Hippospongia cylindrica (accepted) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165098 | Hippospongia decidua (unaccepted synonym of Hyattella sinuosa) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165099 | Hippospongia densa (accepted) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165100 | Hippospongia derasa (accepted) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165101 | Hippospongia equina (unaccepted synonym of Hippospongia communis) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165102 | Hippospongia equina var. meandriniformis (unaccepted synonym of Spongia (Spongia) barbara) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165103 | Hippospongia equina var. micropora (unaccepted synonym of Hippospongia micropora) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165104 | Hippospongia fistulosa (accepted) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165105 | Hippospongia flabellum (unaccepted synonym of Hippospongia canaliculata) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165106 | Hippospongia galea (accepted) 1 | Coscinoderma matthewsi | urn:lsid:marinespecies.org:taxname:165107 | Hippospongia gossypina (accepted)
This looks to me like an incremental fill-down error in a spreadsheet. The data compiler entered the correct LSID ("...165090"), then filled down incrementally over the next 17 records.
Because @timrobertson100 did not include a record ID I can't track the records back to GBIF or OBIS. However, the OBIS page for this sponge species says there were no invalid scientificNameIDs and no dropped records. Can you explain what is likely to have happened when the above records with incorrect scientificNameIDs were processed by OBIS? Did OBIS process each record according to its scientificNameID, thus incorrectly assigning the record to the wrong species?
Note that GBIF processed the records according to scientificName and correctly assigned all 17 records to Coscinoderma mathewsi (Lendenfeld, 1886).
@pieterprovoost, I found the dataset and the "Coscinoderma matthewsi" occurrences in GBIF. The dataset is Vulnerable marine ecosystems in the South Pacific Ocean region and was published by New Zealand's NIWA. Example occurrence here. The OBIS link.
@pieterprovoost, I've answered my own question: OBIS processed "Coscinoderma matthewsi" as Coscinoderma pesleonis based on the incorrect scientificNameID "...165091". The OBIS record id is d47e860b-eff5-42e2-9534-6d24a4810767 from the dataset 6c813a6c-86f7-4d45-beb5-33eebd8de938. How will OBIS correct existing errors of this kind and how will they be treated in future?
@Mesibov I'll have to run the check and report back to our data providers to get this fixed. I'll also add the flags in our QC procedures to prevent this in the future.
Thanks to all for guidance on this.
GBIF.org now processes scientificNameID
, taxonID
and taxonConceptID
for configured identifier schemes.
The following flags have been added to help publishers and consumers understand how the identifiers have been used and where ambiguities may be detected.
Published | Interpreted |
---|---|
TAXON_MATCH_SCIENTIFIC_NAME_ID_IGNORED TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED TAXON_MATCH_TAXON_ID_IGNORED |
The …ID was not used when mapping the record to the GBIF backbone. This may indicate one of:
|
SCIENTIFIC_NAME_ID_NOT_FOUND TAXON_CONCEPT_ID_NOT_FOUND TAXON_ID_NOT_FOUND |
The …ID matched a known pattern, but it was not found in the associated checklist. The backbone lookup was performed using either the names or a different ID field from the record. This may indicate a poorly formatted identifier or may be caused by a newly created ID that isn't yet known in the version of the published checklist. |
SCIENTIFIC_NAME_AND_ID_INCONSISTENT |
The scientificName provided in the occurrence record does not precisely match the name in the registered checklist when using the scientificNameID, taxonID or taxonConceptID to look it up. Publishers are advised to check the IDs are correct, or update the formatting of the names on their records. |
TAXON_MATCH_NAME_AND_ID_AMBIGUOUS |
The GBIF Backbone concept was found using the scientificNameID, taxonID or taxonConceptID, but it differs from what would have been found if the classification names on the record were used. This may indicate a gap in the GBIF backbone, a poor mapping between the checklist and the backbone, or a mismatch between the classification names and the declared IDs (scientificNameID or taxonConceptID) on the occurrence record itself. |
The GBIF.org site and API allow search using issues, such as this example.
Initially the WoRMS LSIDs have been enabled to bring consistency to GBIF and OBIS processing and to work through any teething issues. Other candidates for future use could be the International Plant Name Index LSIDs, Swedish Dyntaxa LSIDs, Catalogue of Life Identifiers, Zoobank LSIDs and Index Fungorum LSIDs.
Because this issue has become very long, and the original request from @bart-v and OBIS team is implemented, this seems like a suitable point to try and close this issue. Please do open issues for specific improvement requests, bugs etc or in https://discourse.gbif.org/
Excellent progress & impressive work. Much appreciated Thanks a lot @timrobertson100
This issue was moved from portal-feedback to pipelines
Example https://www.gbif-uat.org/occurrence/search?dataset_key=740cf4e0-37ca-4389-ba8f-4e1bc5177893&taxon_key=5401803
Lists the records as Oligochaeta and appends the authority "K.Koch" just like that. That makes these marine occurrences terrestrial plants...
While a scientificNameID urn:lsid:marinespecies.org:taxname:2036 is provided, that can be resolved to the animal class Oligochaeta.
This is a missed chance to fix homonyms in an easy way...