AtlasOfLivingAustralia / biocache-service

Occurrence & mapping webservices
https://biocache-ws.ala.org.au/ws/
Other
9 stars 26 forks source link

recordedById duplicates #848

Closed kylie-m closed 7 months ago

kylie-m commented 8 months ago

When there are multiple identifiers, each identifier is duplicated. dwc:identifiedByID does not suffer from this problem.

In the ‘original vs processed’ window, dwc:recordedByID is only in the ‘original’ column, including the duplicated identifiers.

adam-collins commented 8 months ago

Please confirm what needs fixing. I interpret the request as:

  1. Continue to not use raw_identifiedByIds and raw_recordedByIds
  2. Treat recordedByID the same as identifiedByID, i.e. only put the raw value into this field once instead of the current twice.
adam-collins commented 8 months ago

@djtfmartin, @peggynewman please check that I understand the intention of this request as noted above.

The code indicates there was some work this year to move towards using the raw_ fields as well as using the processed value in the ID fields.

branch https://github.com/gbif/pipelines/tree/848_duplicates_in_recordedbyID

adam-collins commented 7 months ago

Changed my mind. Searches will work better if you can search for individual IDs.

adam-collins commented 7 months ago

Pull request https://github.com/gbif/pipelines/pull/983

adam-collins commented 7 months ago

in version 2.18.0-SNAPSHOT

adam-collins commented 4 months ago

Test with https://biocache-ws-test.ala.org.au/ws/occurrence/5623cce5-98af-4723-a332-517e0b575d94, no duplicate recordedByID values https://biocache-ws.ala.org.au/ws/occurrence/5623cce5-98af-4723-a332-517e0b575d94, duplicate recordedByID values

kylie-m commented 4 months ago

Thanks Adam, working correctly for me. Also looking great in the UI - Testing passed!

peggynewman commented 2 days ago

LGTM