Closed adambuttrick closed 2 weeks ago
I have tested this out and added some specific functionality to do retries (which the other task queues aren't really doing). I think a more general solution for EZID's handling of these task queues is in order to know when failures happen and have a more robust retry mechanism. My mechanism will retry every 5 minutes for up to a day upon a failure so it should be able to handle most reasonable outages or maintenance or other things that are not extended or catastrophic.
How I tested the atomic functionality and created an error:
settings/settings.py
and misconfigured the connection to OpenSearch by putting in incorrect authentication.python manage.py proc-search-indexer
ezidapp_searchindexerqueue
for the latest items in the queue and there will be an entry for the update indicating a 403 error in the error message.ezidapp_searchidentifier
. The record being written to the database does not exist so it is handling atomically (both must succeed or fail) rather than creating inconsistent state.settings/settings.py
.Released with https://github.com/CDLUC3/ezid/releases/tag/v3.2.19
See ticket https://github.com/CDLUC3/ezid/issues/696 for information about queued task error handling. There are about 5 tasks that use the same pattern as OpenSearch where they log failures to the database but don't retry or notify.
Describe the functionality to be tested We need to verify that the dual write process (DB and OpenSearch) fails gracefully and maintains data consistency when an error occurs during the update transaction, as described in #640.
Describe the test scenario
In the test environment for OpenSearch:
Expected outcome
Who would benefit from this test? Devs responsible for maintaining the integrity of the search index.