archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: For larger collections, Ingest starts before Transfer SIP created and produces empty AIP #759

Open uofmsean opened 5 years ago

uofmsean commented 5 years ago

Expected behaviour Archivematica should create an AIP consistently for large transfers of 14000+ JPG files regardless of the current load on the database. An Ingest should never start before the Transfer SIP has been created.

Current behaviour A race condition appears to occur between the createsipfromtransferobjects_v0.0 job to create the SIP at the end of the Transfer phase, and the beginning of the Ingest phase that requires the SIP. When the Ingest starts before the Transfer SIP is created, the Ingest creates a new SIP that is empty and the remainder of the Ingest jobs will fail to populate the AIP (due to each job failing to find the files which happen to now map to None UUIDs because the new Ingest SIP is empty).

In testing, a standalone Maria 5.5 DB was used with a large transfer and everything worked as expected (even with 14000+ files). In another test with all other settings identical but using a MySQL 5.7.26 Enterprise shared organizational database (which has a higher load), the same large transfer created an empty AIP because, when Ingest started, it could not locate the Transfer SIP which had not yet been created. Partial log file entries of the successful and failed transfers are included. The successful one (sample-log-success.txt) shows the SIP being created and then the Ingest locating it correctly. The failed one (sample-log-failure.txt) shows the Ingest failing to locate the SIP (which had not finished being created) and then a new empty SIP (with a new UUID) is created and used for the rest of the Ingest. To be clear, the Transfer SIPs (as registered in the database) do get created, just apparently not in time for the Ingest phase.

There appears to be some correlation with the database load that causes larger transfer jobs to fail. In testing the Maria 5.5 standalone DB versus the MySQL 5.7.26 Enterprise shared DB, passing a threshold of around 2600 JPG files would cause the transfer to fail with the MySQL DB but not the Maria DB. When the transfers failed, the same events occurred whereby the Transfer SIP could not be found and a new empty SIP was created for Ingest. At a minimum, the Ingest should not continue if the Transfer SIP is not found, although a better solution would be to ensure the jobs are properly sequenced.

Steps to reproduce It may be difficult to reproduce this in another environment if it is related to database load. It would be better to focus on the job sequencing and see if there is an issue where Ingest can occur before a Transfer SIP is created.

Your environment (version of Archivematica, OS version, etc)

Archivematica 1.9.1 Storage Service 0.14.1 Red Hat Enterprise Linux Server release 7.6 (Maipo) Maria 5.5 or MySQL 5.7.26 Enterprise


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

uofmsean commented 5 years ago

The problem appears to be that the create_sip_from_transfer_objects.py client module moves the completed SIP to the autoProcessSIP directory before the DB transaction is fully committed (due to the larger job). Since it is a watched directory, the MCPServer/archivematicaMCP.py script sees the new SIP and starts processing it, but the findOrCreateSipInDB function doesn't see the SIP in the database and creates a new one (which causes the ingest phase to end up creating an empty AIP). The findOrCreateSipInDB function doesn't utilize the waitSleep parameter to sleep for a grace period to allow the other process to complete. I'm testing a simple fix that adds a time.sleep(waitSleep) at the start of the function and also calls the function with a larger value of 10 seconds rather than use the default of 2.