emory-libraries / blacklight-catalog

1 stars 2 forks source link

Spike: Determine how and where deduplication will take place. #49

Closed tmiles2 closed 3 years ago

tmiles2 commented 4 years ago

Topics/decision for discussion:

laura-ake commented 3 years ago

Michael Gibney from U. Penn libraries shared some information on Code4Lib Slack: " wrt deduping in Blacklight/Solr I thought I'd link to this: [https://github.com/upenn-libraries/solr-source-deduplication/] TL;DR: deduplication in blacklight/Solr can be done as a pre-processing step, or at query-time via Solr "collapse", or (the mean topic of the link above) cached "join" queries. There are benefits/drawbacks of each appraoch, but I favor the "join"-based approach, and am happy to discuss with anyone who's interested in implementing. The approach outlined in the linked writeup is still more or less valid/in use ... but there are some subtle changes/updates I'd recommend if you're actually planning to embark on implementation ... "