AtlasOfLivingAustralia / biocache-store

Occurrence processing, indexing and batch processing
Other
7 stars 24 forks source link

Store duplicate record links in a non-Darwin Core field for linked records #313

Closed ansell closed 2 years ago

ansell commented 5 years ago

The current Darwin Core field associatedOccurrences is being used currently by the ALA to store the results of the ALA internal duplicate record detector. This causes a conflict with guidance that ALA data providers have been given in the past to use associatedOccurrences to provide links between legitimate occurrence records that need to link together for whatever reason.

If a non-standard field was used for the results of the internal ALA duplicate record detector there would be no conflict in this case, and it would allow us to remove the following hack that is in place to avoid the duplicate record declarations being overwritten by standardised use of the field:

https://github.com/AtlasOfLivingAustralia/biocache-store/blob/master/src/main/scala/au/org/ala/biocache/dao/OccurrenceDAOImpl.scala#L51

It may also make it possible to resolve this issue given to us by an ALA data provider regarding record links: https://github.com/AtlasOfLivingAustralia/data-management/issues/375

Because of the prohibition in OccurrenceDAOImpl, it would be impossible to cleanup the associatedOccurrences field in that case using our standard processor.

brucehyslop commented 2 years ago

biocache-store has been replaced by pipelines.