esmero / strawberryfield

A Field of strawberries
GNU Lesser General Public License v3.0
10 stars 5 forks source link

Use Case: Audit Trail for Solr Index Activity #311

Open noahwsmith opened 5 months ago

noahwsmith commented 5 months ago

On a large project with workflows for repeatedly updating objects we see continuous Solr indexing activity as objects are touched and then are requeued for indexing. In particular objects which have OCR/HOCR take a long time to clear through the queues, and if object is touched several times in a day we suspect that the full object (including OCR) may be reindexed several times even though sometimes the only thing that has changed on the object is the workflow state.

To get some visibility on how often objects are being reindexed in Solr, it would be amazing to be able to read an audit log for indexing activity on a given object. What options might there be to do this?