I just stumbled on a few left over OCRs on a server. The case was a PDF file was removed and replace with a new one, the old OCRs were still there. I can't confirm is this happened because of me (e.g I re-enqueued everything to be reindex not letting the untracking to have an effect) or bc this https://github.com/esmero/strawberryfield/blob/1.4.0/src/EventSubscriber/StrawberryEventSaveFlavorSubscriber.php is not doing its job correctly or entering some type of race condition where untracking is being overriden by updating again (that would be a bug in the Search API)
To get to the obscure mechanics of "finding what needs to be deleted" on a file removal/adding, checking if untracking for deletion is persistent and works and that nothing else (e.g the new OCR processing starts first and gets in the way) I will have to do intensive testing. One idea I have is that because a NODE reindex will also request (at the Data Source level) a Flavor reindex, there are chances that if that reindex is requested, the untracking for deletion might not have any effect.
@alliomeria this is related to an open ticket of one of our users
A backup solution to this problem would be to "save in a key/value" what we know needs to be untracked too and use as secondary mechanist found in the search api (a hook) named ::alterIndexedItems() that need to be present in a processor (we have a few we could use/re-use) that could in case an already deleted item comes into the index, its intercepted there by querying this key/value from DB, removed and then the key/value is deleted. This key value could be temporary? Only issue with temporary is that those are used dependendent, so the "deleter" will generate it, but the index will run as anonymous ....
Another issue might be a misconfiguration of the Solr fields belonging to a Flavor. We make a SOlr query to know what needs to be deleted. So maybe we should make some Solr fields "fixed" and untouchable... I saw a few examples somewhere on how to do that
Since Drupal has made mandatory "Read Committed" as DB flag, deleting and accessing Key Values from a DB in a single PHP request might (might) lead to getting back on the "reading" what was already deleted. The way Drupal "commits" DB is in a Service destructor which basically means any DB transaction? (why) happens once all is done. This requires us to use a service level "cache" that can be used to access/always see what was deleted inside a single PHP request. For a good example see \Drupal\search_api\Utility\QueryHelper and how it caches results by IDs.
So. I can't reproduce this after a lot of testing... running 1.4.0 with transaction isolation read committed, and doing exactly that repeatedly. Adding/removing OCR over and over.
What?
I just stumbled on a few left over OCRs on a server. The case was a PDF file was removed and replace with a new one, the old OCRs were still there. I can't confirm is this happened because of me (e.g I re-enqueued everything to be reindex not letting the untracking to have an effect) or bc this https://github.com/esmero/strawberryfield/blob/1.4.0/src/EventSubscriber/StrawberryEventSaveFlavorSubscriber.php is not doing its job correctly or entering some type of race condition where
untracking
is being overriden by updating again (that would be a bug in the Search API)To get to the obscure mechanics of "finding what needs to be deleted" on a file removal/adding, checking if untracking for deletion is persistent and works and that nothing else (e.g the new OCR processing starts first and gets in the way) I will have to do intensive testing. One idea I have is that because a NODE reindex will also request (at the Data Source level) a Flavor reindex, there are chances that if that reindex is requested, the untracking for deletion might not have any effect.
@alliomeria this is related to an open ticket of one of our users
A backup solution to this problem would be to "save in a key/value" what we know needs to be untracked too and use as secondary mechanist found in the search api (a hook) named
::alterIndexedItems()
that need to be present in a processor (we have a few we could use/re-use) that could in case an already deleted item comes into the index, its intercepted there by querying this key/value from DB, removed and then the key/value is deleted. This key value could be temporary? Only issue with temporary is that those are used dependendent, so the "deleter" will generate it, but the index will run as anonymous ....Another issue might be a misconfiguration of the Solr fields belonging to a Flavor. We make a SOlr query to know what needs to be deleted. So maybe we should make some Solr fields "fixed" and untouchable... I saw a few examples somewhere on how to do that
mmmm