CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

Recovery of submission errors #553

Open marisastrong opened 3 years ago

marisastrong commented 3 years ago

At times we have brief outage windows in which the db is unavailable for a brief period. Currently the plan is to schedule pausing submissions for up to an hour (30 min window + 30 min drain queue time) to prevent errors that might occur during the outage.

How much is involved to recover items that may fail during a brief, few minute outage vs having items queued up for 60 minutes?

https://confluence.ucop.edu/display/UC3/MerrittOperations#MerrittOperations-reprocess-objects-inv

elopatin-uc3 commented 3 years ago

One related comment regarding ETDs. ETD submissions are ingested each morning at 7:30AM. These are unique in the sense that associated processes occur around the same time to add information to the ETD database and prep files for processing the following day. If ingest failures occur, some associated processes do not complete, and cleanup is very time consuming.

elopatin-uc3 commented 3 years ago

Submission types:

Related: Visibility possible to provide via admin tool? What and how could we provide information related to submission failures?

marisastrong commented 3 years ago

tracking various scenarios, troubleshooting and resolution:

https://confluence.ucop.edu/display/UC3/Failed+Submissions