esmero / strawberryfield

A Field of strawberries
GNU Lesser General Public License v3.0
10 stars 5 forks source link

Are we sure Flavors are being removed on File deletion/removal from an ADO? #307

Open DiegoPino opened 6 months ago

DiegoPino commented 6 months ago

What?

I just stumbled on a few left over OCRs on a server. The case was a PDF file was removed and replace with a new one, the old OCRs were still there. I can't confirm is this happened because of me (e.g I re-enqueued everything to be reindex not letting the untracking to have an effect) or bc this https://github.com/esmero/strawberryfield/blob/1.4.0/src/EventSubscriber/StrawberryEventSaveFlavorSubscriber.php is not doing its job correctly or entering some type of race condition where untracking is being overriden by updating again (that would be a bug in the Search API)

To get to the obscure mechanics of "finding what needs to be deleted" on a file removal/adding, checking if untracking for deletion is persistent and works and that nothing else (e.g the new OCR processing starts first and gets in the way) I will have to do intensive testing. One idea I have is that because a NODE reindex will also request (at the Data Source level) a Flavor reindex, there are chances that if that reindex is requested, the untracking for deletion might not have any effect.

@alliomeria this is related to an open ticket of one of our users

A backup solution to this problem would be to "save in a key/value" what we know needs to be untracked too and use as secondary mechanist found in the search api (a hook) named ::alterIndexedItems() that need to be present in a processor (we have a few we could use/re-use) that could in case an already deleted item comes into the index, its intercepted there by querying this key/value from DB, removed and then the key/value is deleted. This key value could be temporary? Only issue with temporary is that those are used dependendent, so the "deleter" will generate it, but the index will run as anonymous ....

Another issue might be a misconfiguration of the Solr fields belonging to a Flavor. We make a SOlr query to know what needs to be deleted. So maybe we should make some Solr fields "fixed" and untouchable... I saw a few examples somewhere on how to do that

mmmm

DiegoPino commented 6 months ago

Some findings:

DiegoPino commented 6 months ago

Another finding:

DiegoPino commented 6 months ago

So. I can't reproduce this after a lot of testing... running 1.4.0 with transaction isolation read committed, and doing exactly that repeatedly. Adding/removing OCR over and over.