archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
15 stars 1 forks source link

Problem: Write-only replicator fails if the transfer name of an uncompressed AIP contains dots #1697

Open replaceafill opened 1 month ago

replaceafill commented 1 month ago

Expected behaviour

Write-only replicator locations can store AIP replicas independently of the compression algorithm set in the pipeline.

Current behaviour

Replicator locations set in a Write-Only Replica Staging space fail if:

This is the log of the Storage Service with this scenario using a transfer name foobar.dot:

Storage Service logs ```console DEBUG 2024-06-05 13:16:49 locations.models.space:space:move_from_storage_service:393: FROM: src: 2ffe/ccae/4465/4fe5/8dab/6733/e0b4/ccb3/foobar.dot-2ffeccae-4465-4fe5-8dab-6733e0b4ccb3 DEBUG 2024-06-05 13:16:49 locations.models.space:space:move_from_storage_service:394: FROM: dst: var/archivematica/sharedDirectory/www/write-only-replicas/2ffe/ccae/4465/4fe5/8dab/6733/e0b4/ccb3/foobar.dot-2ffeccae-4465-4fe5-8dab-6733e0b4ccb3/ INFO 2024-06-05 13:16:49 common.utils:utils:create_tar:604: creating archive of foobar.dot-2ffeccae-4465-4fe5-8dab-6733e0b4ccb3 at /var/archivematica/storage_service/2ffe/ccae/4465/4fe5/8dab/6733/e0b4/ccb3/foobar.tar, relative to /var/archivematica/storage_service/2ffe/ccae/4465/4fe5/8dab/6733/e0b4/ccb3 INFO 2024-06-05 13:16:49 locations.models.space:space:move_rsync:520: Moving from /var/archivematica/storage_service/2ffe/ccae/4465/4fe5/8dab/6733/e0b4/ccb3/foobar.dot-2ffeccae-4465-4fe5-8dab-6733e0b4ccb3.tar to /var/archivematica/sharedDirectory/www/write-only-replicas/foobar.dot-2ffeccae-4465-4fe5-8dab-6733e0b4ccb3.tar INFO 2024-06-05 13:16:49 locations.models.space:space:move_rsync:562: rsync command: ['rsync', '-t', '-O', '--protect-args', '-vv', '--chmod=Fug+rw,o-rwx,Dug+rwx,o-rwx', '-r', '/var/archivematica/storage_service/2ffe/ccae/4465/4fe5/8dab/6733/e0b4/ccb3/foobar.dot-2ffeccae-4465-4fe5-8dab-6733e0b4ccb3.tar', '/var/archivematica/sharedDirectory/www/write-only-replicas/foobar.dot-2ffeccae-4465-4fe5-8dab-6733e0b4ccb3.tar'] WARNING 2024-06-05 13:16:49 locations.models.space:space:move_rsync:570: Rsync failed with status 23: b'sending incremental file list\nrsync: [sender] link_stat "/var/archivematica/storage_service/2ffe/ccae/4465/4fe5/8dab/6733/e0b4/ccb3/foobar.dot-2ffeccae-4465-4fe5-8dab-6733e0b4ccb3.tar" failed: No such file or directory (2)\ndelta-transmission disabled for local transfer or --whole-file\ntotal: matches=0 hash_hits=0 false_alarms=0 data=0\n\nsent 19 bytes received 79 bytes 196.00 bytes/sec\ntotal size is 0 speedup is 0.00\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1338) [sender=3.2.7]\n' ERROR 2024-06-05 13:16:49 django.request.tastypie:resources:_handle_500:309: Internal Server Error: /api/v2/file/ Traceback (most recent call last): File "/pyenv/data/versions/3.9.19/lib/python3.9/site-packages/tastypie/resources.py", line 221, in wrapper response = callback(request, *args, **kwargs) File "/pyenv/data/versions/3.9.19/lib/python3.9/site-packages/tastypie/resources.py", line 456, in dispatch_list return self.dispatch('list', request, **kwargs) File "/pyenv/data/versions/3.9.19/lib/python3.9/site-packages/tastypie/resources.py", line 488, in dispatch response = method(request, **kwargs) File "/pyenv/data/versions/3.9.19/lib/python3.9/site-packages/tastypie/resources.py", line 1399, in post_list updated_bundle = self.obj_create(bundle, **self.remove_api_resource_names(kwargs)) File "/src/storage_service/locations/api/resources.py", line 1135, in obj_create self._store_bundle(bundle) File "/src/storage_service/locations/api/resources.py", line 1066, in _store_bundle bundle.obj.store_aip( File "/src/storage_service/locations/models/package.py", line 868, in store_aip self.create_replicas() File "/src/storage_service/locations/models/package.py", line 1538, in create_replicas self.replicate(replicator_loc) File "/src/storage_service/locations/models/package.py", line 725, in replicate replica_storage_effects = dest_space.move_from_storage_service( File "/src/storage_service/locations/models/space.py", line 401, in move_from_storage_service return child_space.move_from_storage_service( File "/src/storage_service/locations/models/replica_staging.py", line 58, in move_from_storage_service return self._store_tar_replica(src_path, dest_path, package) File "/src/storage_service/locations/models/replica_staging.py", line 76, in _store_tar_replica self.space.move_rsync(tar_src_path, tar_dest_path) File "/src/storage_service/locations/models/space.py", line 571, in move_rsync raise StorageException(s) locations.models.StorageException: Rsync failed with status 23: b'sending incremental file list\nrsync: [sender] link_stat "/var/archivematica/storage_service/2ffe/ccae/4465/4fe5/8dab/6733/e0b4/ccb3/foobar.dot-2ffeccae-4465-4fe5-8dab-6733e0b4ccb3.tar" failed: No such file or directory (2)\ndelta-transmission disabled for local transfer or --whole-file\ntotal: matches=0 hash_hits=0 false_alarms=0 data=0\n\nsent 19 bytes received 79 bytes 196.00 bytes/sec\ntotal size is 0 speedup is 0.00\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1338) [sender=3.2.7]\n' ERROR 2024-06-05 13:16:49 django.request:log:log_response:241: Internal Server Error: /api/v2/file/ DEBUG 2024-06-05 13:16:50 locations.api.resources:resources:hydrate_current_location:1019: `current_location` was not matched by `default_location_regex` ```

Steps to reproduce

  1. Set a Write-Only Replica Staging space and create a Replicator location in it.
  2. Set the replicator location in your AIP Storage location.
  3. Start a transfer and add a dot in the middle of its name, e.g. foobar.dot
  4. In the Ingest tab the Store the AIP job of the Store AIP microservice will fail.
  5. The logs of the Storage service contain an error like above.

Your environment (version of Archivematica, operating system, other relevant details)

https://github.com/artefactual/archivematica/commit/0232d9bfd7b82e385b15cebba6cb9a6ed85ac9f1 https://github.com/artefactual/archivematica-storage-service/commit/86886b5c6a88430067936500455567a5f071fc7c

This is a regression introduced in https://github.com/archivematica/Issues/issues/1622


For Artefactual use:

Before you close this issue, you must check off the following:

antonar commented 3 weeks ago

I can confirm that this is the same exception we're seeing in the storage service logs at NHA. The transfers are created by Enduro, which is picking up tar archived SIPs from our pre-ingest tool, so storage service should probably be able to handle the dot.

klavman commented 3 weeks ago

The cause of the replica storage locations failing is related to the with_suffix in file storage_service/common/utils.py that I had modified in a previous update.

Return a new path with the file suffix changed. If the path has no suffix, add given suffix. If the given suffix is an empty string, remove the suffix from the path.

In: example_path = Path("/home/test/foobar.dot-in-name")
In: example_path.with_suffix(".tar")
Out: PosixPath('/home/test/foobar.tar')

I am considering is to convert it to string and add extension, using Path there's no straightforward solution:

In: tarpath = Path(f"{example_path}.tar")
In: tarpath
Out: PosixPath('/home/test/foobar.dot-in-name.tar')
replaceafill commented 1 week ago

Thank you for sharing your findings @klavman. That's exactly how I've fixed this in https://github.com/artefactual/archivematica-storage-service/pull/724 after adding a failing test for this specific case.