emory-libraries / blacklight-catalog

1 stars 2 forks source link

SPIKE: Investigate why records removed from Alma are not being removed from Solr #1173

Closed lovinscari closed 2 years ago

lovinscari commented 2 years ago

A report from a user regarding a search for Washington Post resulted in availability API malfunctioning. It was found MMS ID 9936495300002486 is no longer in Alma but is still showing in Solr, thus was causing the real time availability API to malfunction (endlessly spin). The record was removed from Alma on 12/8/21, thus I would have expected the record to have been removed from Solr after our last reindex on 1/4/22, but it was not. Please investigate any issues with this process and submit a new ticket for work to be completed to resolve this if this was more than a fluke (one-off).

bwatson78 commented 2 years ago

Theory of the most likely the reason it hasn't been deleted: CRON jobs weren't working prior to 12-16-2021. The most logs I've seen contain an MMSID tagged for deletion is 5 (jobs run every 6 hours, so for a 30hr stretch, they show as deleted.) Since we reindexed whenever we deployed last (didn't wipe and index), we're going to have more of these show up. Attempting to reindex single MMSIDs will both validate their deletion and remove them from solr. Attempting that now...

bwatson78 commented 2 years ago
-bash-4.2$ RAILS_ENV=production bin/rails marc_index_ingest oai_single_id=9936495300002486
I, [2022-01-06T20:15:36.976011 #13748]  INFO -- : Setting 'from' time: 2022-01-06T20:11:07Z
I, [2022-01-06T20:15:37.007489 #13748]  INFO -- : Calling OAI with query string: ?verb=GetRecord&identifier=oai:alma.01GALI_EMORY:9936495300002486&metadataPrefix=marc21
I, [2022-01-06T20:15:37.260378 #13748]  INFO -- : Starting record count: 0
I, [2022-01-06T20:15:37.260791 #13748]  INFO -- : Deleted IDs: ["9936495300002486"]
I, [2022-01-06T20:15:37.260876 #13748]  INFO -- : Suppressed IDs: []
I, [2022-01-06T20:15:37.260953 #13748]  INFO -- : Lost/Stolen IDs: []
I, [2022-01-06T20:15:37.261023 #13748]  INFO -- : Deactivated Portfolio IDs: []
I, [2022-01-06T20:15:37.261104 #13748]  INFO -- : Temporarily Located IDs: []
I, [2022-01-06T20:15:37.261171 #13748]  INFO -- : Found 1 delete records.
I, [2022-01-06T20:15:37.261356 #13748]  INFO -- : 0 records retrieved
I, [2022-01-06T20:15:37.261430 #13748]  INFO -- : Active IDs: []
I, [2022-01-06T20:15:37.323544 #13748]  INFO -- : {"responseHeader"=>{"status"=>0, "QTime"=>37}}

This is confirmation that 9936495300002486 and others like them can be simply removed from Solr by reindexing their MMSIDs. I'd recommend that future finds like this be filed in a bug ticket so that we can scour the logs, than attempt to reindex them.

lovinscari commented 2 years ago

Thanks @bwatson78 for investigating this so quickly. I will note any reports like this as bug tickets in the future.