Closed RussellMcOrmond closed 4 years ago
New process seems to be working. Rather than trying to upload 3 times and only checking validation once, we now try 5 times and couple a validation with each upload attempt -- this means that the validation can fail as well and retry.
I also added a 30 second delay before the retry, given most of the errors so far have been because of pending container updates.
2020/01/17 18:16:54 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Accepted job. ingestReq = {"processdate":"2020-01-17T18:16:54Z","type":"new","changelog":"Ingesting newspaper issues","date":"2020-01-17T12:57:43Z","processhost":"jarlsberg-ingest.tor.c7a.ca","request":"ingest"}
2020/01/17 18:16:55 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Created new AIP in /home/tdr/tempIngest/oocihm.N_00693_18991129
2020/01/17 18:17:23 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Changelog: Ingesting newspaper issues
2020/01/17 18:17:23 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Copying /home/tdr/tempIngest/oocihm.N_00693_18991129 to Swift
2020/01/17 18:17:31 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Swift copy of /home/tdr/tempIngest/oocihm.N_00693_18991129 complete, Validating
2020/01/17 18:17:33 - WARN {CIHM.TDR} [CIHM::WIP::Ingest::Worker::warnings] oocihm.N_00693_18991129: validation of oocihm.N_00693_18991129 failed
2020/01/17 18:18:10 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Swift copy of /home/tdr/tempIngest/oocihm.N_00693_18991129 complete, Validating
2020/01/17 18:18:10 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Done processing
We are having a problem that is showing up with the busts of updates.
Sometimes an AIP fails validation, which uses Swift container information which has the MD5 and checks against the BagIT manifest. Sometimes there are pending container updates, and soon after the BagIT check fails things are fine.
We then hit 'retry' and it tries again. This sends a similar AIP, but with different dates on the changelog and thus a different manifest.
What has happened a few times is that the changelog from one revision and the manifest from a different end up being the current file once all the pending container updates complete.
Example:
The md5 listed in the manifest, which is the same manifest on both computers, is 4562a176fc4f03025e353c00e0e5fd4b
If I upload the correct changelog file to Swift, then everything works fine.