Encountering a situation where static harvesting ~50k records, using an XPath query to extract the record_id. After analysis, clear that records sharing a record_id are in fact identical (bore out by identical fingerprint fields).
Possible to run Merge/Duplicate job on this and remove dupes, but would be nice to de-dupe on static harvest. Potentially on OAI harvest as well?
Encountering a situation where static harvesting ~50k records, using an XPath query to extract the
record_id
. After analysis, clear that records sharing arecord_id
are in fact identical (bore out by identicalfingerprint
fields).Possible to run Merge/Duplicate job on this and remove dupes, but would be nice to de-dupe on static harvest. Potentially on OAI harvest as well?