crkn-rcdr / CIHM-TDR

CIHM::TDR perl module. Used for manipulating the repository (ingest, replication, fixity, etc)
0 stars 0 forks source link

AIP upload to Swift -- mismatching revisions. #7

Closed RussellMcOrmond closed 4 years ago

RussellMcOrmond commented 4 years ago

We are having a problem that is showing up with the busts of updates.

Sometimes an AIP fails validation, which uses Swift container information which has the MD5 and checks against the BagIT manifest. Sometimes there are pending container updates, and soon after the BagIT check fails things are fine.

We then hit 'retry' and it tries again. This sends a similar AIP, but with different dates on the changelog and thus a different manifest.

What has happened a few times is that the changelog from one revision and the manifest from a different end up being the current file once all the pending container updates complete.

Example:

root@paneer-repomanage:/cihmz2/repository/incoming/oocihm.N_00700_19130528# md5sum data/changelog.txt
56024101451bdff48e9abcd925c27b33  data/changelog.txt
root@paneer-repomanage:/cihmz2/repository/incoming/oocihm.N_00700_19130528# cat data/changelog.txt
2020-01-16T03:33:04Z  Created new AIP
2020-01-16T03:33:40Z  Ingesting newspaper issues
root@paneer-repomanage:/cihmz2/repository/incoming/oocihm.N_00700_19130528# 
tdr@romano-repomanage:/cihmz1/repository/aip/oocihm/953/oocihm.N_00700_19130528$ md5sum data/changelog.txt 
4562a176fc4f03025e353c00e0e5fd4b  data/changelog.txt
tdr@romano-repomanage:/cihmz1/repository/aip/oocihm/953/oocihm.N_00700_19130528$ cat data/changelog.txt
2020-01-16T05:49:16Z  Created new AIP
2020-01-16T05:49:52Z  Ingesting newspaper issues
tdr@romano-repomanage:/cihmz1/repository/aip/oocihm/953/oocihm.N_00700_19130528$ 

The md5 listed in the manifest, which is the same manifest on both computers, is 4562a176fc4f03025e353c00e0e5fd4b

If I upload the correct changelog file to Swift, then everything works fine.

RussellMcOrmond commented 4 years ago

New process seems to be working. Rather than trying to upload 3 times and only checking validation once, we now try 5 times and couple a validation with each upload attempt -- this means that the validation can fail as well and retry.

I also added a 30 second delay before the retry, given most of the errors so far have been because of pending container updates.

2020/01/17 18:16:54 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Accepted job. ingestReq = {"processdate":"2020-01-17T18:16:54Z","type":"new","changelog":"Ingesting newspaper issues","date":"2020-01-17T12:57:43Z","processhost":"jarlsberg-ingest.tor.c7a.ca","request":"ingest"}
2020/01/17 18:16:55 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Created new AIP in /home/tdr/tempIngest/oocihm.N_00693_18991129
2020/01/17 18:17:23 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Changelog: Ingesting newspaper issues
2020/01/17 18:17:23 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Copying /home/tdr/tempIngest/oocihm.N_00693_18991129 to Swift
2020/01/17 18:17:31 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Swift copy of /home/tdr/tempIngest/oocihm.N_00693_18991129 complete, Validating
2020/01/17 18:17:33 - WARN {CIHM.TDR} [CIHM::WIP::Ingest::Worker::warnings] oocihm.N_00693_18991129: validation of oocihm.N_00693_18991129 failed
2020/01/17 18:18:10 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Swift copy of /home/tdr/tempIngest/oocihm.N_00693_18991129 complete, Validating
2020/01/17 18:18:10 - INFO {CIHM.TDR} [CIHM::WIP::Ingest::Process::process] oocihm.N_00693_18991129: Done processing