MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

De-Duplicate on static harvest #393

Open ghukill opened 5 years ago

ghukill commented 5 years ago

Encountering a situation where static harvesting ~50k records, using an XPath query to extract the record_id. After analysis, clear that records sharing a record_id are in fact identical (bore out by identical fingerprint fields).

Possible to run Merge/Duplicate job on this and remove dupes, but would be nice to de-dupe on static harvest. Potentially on OAI harvest as well?