archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: Full reingest fails when Storage Service and Pipeline are on separate servers (under some system configuration conditions) #1663

Open hakamine opened 7 months ago

hakamine commented 7 months ago

Expected behaviour

Full reingest should work irrespective of the system configuration (provided the configuration is correct)

Current behaviour

Attempting a full reingest fails under the following configuration:

Steps to reproduce

Your environment (version of Archivematica, operating system, other relevant details)

AM version: stable/1.15.x , SS version: stable/0.21.x (although the same problem may have also occurred in previous versions, ref issue https://github.com/archivematica/Issues/issues/1456 ) OS: Rocky 9/RHEL 9 (the issue seems to be unrelated to the OS)

Additional Information/Analysis

After triggering a reingest, and before approving the transfer, it can be noted that the transfer files are "enclosed" under a duplicated directory structure in the watched directory, for example:

watchedDirectories/activeTransfers/standardTransfer/
└── [   60]  oneimage2-8e8ff0f8-eef5-4401-bc2a-352f45b80b45
    └── [  112]  oneimage2-74cad4f0-d1d4-467c-88f8-f29f008d6387
        ├── [  195]  bag-info.txt
        ├── [   55]  bagit.txt
        ├── [   86]  data
        │   ├── [  107]  logs
        │   │   ├── [   36]  fileFormatIdentification.log
        │   │   ├── [ 1.2K]  filenameChanges.log
        │   │   ├── [  124]  FileUUIDs.log
        │   │   └── [   60]  transfers
        │   │       └── [   18]  oneimage2-d4a5c1ca-3393-4da8-bb7d-e2cad03b8175
        │   │           └── [   90]  logs
        │   │               ├── [  599]  fileFormatIdentification.log
        │   │               ├── [  143]  filenameChanges.log
        │   │               └── [   61]  FileUUIDs.log
        │   ├── [  55K]  METS.74cad4f0-d1d4-467c-88f8-f29f008d6387.xml
        │   └── [  122]  objects
        │       ├── [  23K]  lion-2ffb671c-c541-48d5-bd61-19a16c2461ee.svg
        │       ├── [  18K]  lion.svg
        │       ├── [   23]  metadata
        │       │   └── [   60]  transfers
        │       │       └── [   32]  oneimage2-d4a5c1ca-3393-4da8-bb7d-e2cad03b8175
        │       │           └── [  153]  directory_tree.txt
        │       └── [   69]  submissionDocumentation
        │           └── [   22]  transfer-oneimage2-d4a5c1ca-3393-4da8-bb7d-e2cad03b8175
        │               └── [  16K]  METS.xml
        ├── [ 1.4K]  manifest-sha256.txt
        └── [  238]  tagmanifest-sha256.txt

Note there are two "oneimage2-xxxxx" subdirectory levels (one with the original AIP uuid, and one with the uuid assigned for reingest). The double directory level is what will cause the reingest process to fail.

When a full reingest is triggered, the AIP files are copied/moved among locations following roughly the following flow (assuming a compressed AIP) ( ref. SS code for locations.models.package:package:start_reingest() here and AM code for archivematica.dashboard:views:reingest() here ):

1) The SS copies the AIP from the aipstore and extracts it to the Storage Service internal processing location 2) The SS moves the extracted AIP files to the staging path of the respective pipeline FS 3) The SS moves the extracted AIP files to the "currently processing" location of the corresponding pipeline 4) A reingest API call is made to the pipeline. The pipeline moves the extracted AIP files from the currently processing location to the pipeline watched directory

It looks like the problematic step is 2) above. By taking a look at the logs, the move is implemented via rsync, which seems to be creating the problematic double directory level structure (note the source parameter of rsync does not have a "/" at the end, meaning that it will make a copy of the source files including the enclosing directory):

DEBUG   2024-02-09 11:23:48  locations.models.package:package:start_reingest:2257:  Reingest: extracted to /var/archivematica/storage_service/tmpztdld58u/oneimage2-74cad4f0-d1d4-467c-88f8-f29f008d6387
INFO    2024-02-09 11:23:48  locations.models.package:package:start_reingest:2331:  Reingest: files: ['tmpztdld58u/oneimage2-74cad4f0-d1d4-467c-88f8-f29f008d6387']
DEBUG   2024-02-09 11:23:48  locations.models.package:package:start_reingest:2347:  Reingest: Current location: 413d6e6f-dcad-4937-8179-544a2020c28d: var/archivematica/storage_service (Storage Service Internal Processing)
DEBUG   2024-02-09 11:23:48  locations.models.space:space:move_to_storage_service:341:  TO: src: var/archivematica/storage_service/tmpztdld58u/oneimage2-74cad4f0-d1d4-467c-88f8-f29f008d6387
DEBUG   2024-02-09 11:23:48  locations.models.space:space:move_to_storage_service:342:  TO: dst: tmpztdld58u/oneimage2-74cad4f0-d1d4-467c-88f8-f29f008d6387
DEBUG   2024-02-09 11:23:48  locations.models.space:space:move_to_storage_service:343:  TO: staging: /mnt/pfs_staging
INFO    2024-02-09 11:23:48  locations.models.space:space:move_rsync:545:  Moving from /var/archivematica/storage_service/tmpztdld58u/oneimage2-74cad4f0-d1d4-467c-88f8-f29f008d6387 to /mnt/pfs_staging/tmpztdld58u/oneimage2-74cad4f0-d1d4-467c-88f8-f29f008d6387
INFO    2024-02-09 11:23:48  locations.models.space:space:move_rsync:587:  rsync command: ['rsync', '-t', '-O', '--protect-args', '-vv', '--chmod=Fug+rw,o-rwx,Dug+rwx,o-rwx', '-r', '/var/archivematica/storage_service/tmpztdld58u/oneimage2-74cad4f0-d1d4-467c-88f8-f29f008d6387', '/mnt/pfs_staging/tmpztdld58u/oneimage2-74cad4f0-d1d4-467c-88f8-f29f008d6387']

In some installations, the SS internal processing location matches the value for fhe pipeline FS staging path (e.g., both are set to /var/archivematica/storage_service), in this case step 2) above does not trigger a move (i.e., rsync not run) and the problem is avoided. This (i.e., setting the value for fhe pipeline FS staging path to be the same as the SS internal processing location) could be the simplest workaround for this issue until the bug is fixed.


For Artefactual use:

Before you close this issue, you must check off the following: