Closed kylie-m closed 7 months ago
Please confirm what needs fixing. I interpret the request as:
raw_identifiedByIds
and raw_recordedByIds
recordedByID
the same as identifiedByID
, i.e. only put the raw value into this field once instead of the current twice.@djtfmartin, @peggynewman please check that I understand the intention of this request as noted above.
The code indicates there was some work this year to move towards using the raw_
fields as well as using the processed value in the ID
fields.
branch https://github.com/gbif/pipelines/tree/848_duplicates_in_recordedbyID
Changed my mind. Searches will work better if you can search for individual IDs.
Pull request https://github.com/gbif/pipelines/pull/983
in version 2.18.0-SNAPSHOT
Test with https://biocache-ws-test.ala.org.au/ws/occurrence/5623cce5-98af-4723-a332-517e0b575d94, no duplicate recordedByID values https://biocache-ws.ala.org.au/ws/occurrence/5623cce5-98af-4723-a332-517e0b575d94, duplicate recordedByID values
Thanks Adam, working correctly for me. Also looking great in the UI - Testing passed!
LGTM
When there are multiple identifiers, each identifier is duplicated.
dwc:identifiedByID
does not suffer from this problem.In the ‘original vs processed’ window,
dwc:recordedByID
is only in the ‘original’ column, including the duplicated identifiers.