archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: Transfers silently fail and create a partial transfer when Currently Processing location runs out of space #325

Open jlj5aj opened 6 years ago

jlj5aj commented 6 years ago

Expected behaviour When a transfer is initiated that causes the Currently Processing location to run out of disk space, this error should a) cause the transfer to fail (with appropriate cleanup) and b) be shown to the user on the dashboard.

Current behaviour A transfer that runs out of disk space while copying files from the transfer source into the Currently Processing location will silently fail, and create a new transfer in the dashboard for approval. (Relatedly, the approval step does not show any information about the files included in the transfer, which could be helpful in determining whether or not to approve it.)

Steps to reproduce

  1. Start a transfer that is larger than the remaining disk space for Currently Processing location
  2. See the transfer show up in the dashboard for approval
  3. Verify the transfer within the Currently Processing location is only partially complete

Your environment (version of Archivematica, OS version, etc) Archivematica 1.7.2, CentOS 7.5


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

jlj5aj commented 6 years ago

Here is some info from a sample disk space exhaustion failure.

Relevant storage service log entries:

DEBUG     2018-11-08 15:09:10  locations.models.space:space:posix_move:281:  posix_move: source_path: path/to/content/sourcedir/.
DEBUG     2018-11-08 15:09:10  locations.models.space:space:posix_move:282:  posix_move: destination_path: /var/archivematica/sharedDirectory/tmp/tmpd9MukX/largetransfer/.
DEBUG     2018-11-08 15:09:10  locations.models.space:space:posix_move:283:  posix_move: destination_space.path: /
INFO      2018-11-08 15:09:10  locations.models.space:space:move_rsync:469:  Moving from /path/to/content/sourcedir/. to /var/archivematica/sharedDirectory/tmp/tmpd9MukX/largetransfer/.
DEBUG     2018-11-08 15:09:10  locations.models.space:space:move_rsync:483:  os.rename failed, trying with normalized paths
DEBUG     2018-11-08 15:09:10  locations.models.space:space:move_rsync:492:  os.rename failed, falling back to rsync. Source: /path/to/content/sourcedir; Destination: /var/archivematica/sharedDirectory/tmp/tmpd9MukX/largetransfer
INFO      2018-11-08 15:09:10  locations.models.space:space:move_rsync:499:  rsync command: ['rsync', '-t', '-O', '--protect-args', '-vv', '--chmod=Fug+rw,o-rwx,Dug+rwx,o-rwx', '-r', '/path/to/content/sourcedir/.', '/var/archivematica/sharedDirectory/tmp/tmpd9MukX/largetransfer/.']
WARNING   2018-11-08 15:11:25  locations.models.space:space:move_rsync:507:  Rsync failed with status 11: sending incremental file list
delta-transmission disabled for local transfer or --whole-file
<snip file listing>
rsync: write failed on "/var/archivematica/sharedDirectory/tmp/tmpd9MukX/largetransfer/somefile": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2]

ERROR     2018-11-08 15:11:25  django.request.tastypie:resources:_handle_500:301:  Internal Server Error: /api/v2/location/b5bb1db5-e34a-40cf-96b2-2a1d84a6d022/
Traceback (most recent call last):
  File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/lib/python2.7/site-packages/tastypie/resources.py", line 220, in wrapper
    response = callback(request, *args, **kwargs)
  File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/lib/python2.7/site-packages/tastypie/resources.py", line 460, in dispatch_detail
    return self.dispatch('detail', request, **kwargs)
  File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/lib/python2.7/site-packages/tastypie/resources.py", line 483, in dispatch
    response = method(request, **kwargs)
  File "/usr/lib/archivematica/storage-service/locations/api/resources.py", line 501, in post_detail
    return self._handle_location_file_move(move_files, request, *args, **kwargs)
  File "/usr/lib/archivematica/storage-service/locations/api/resources.py", line 456, in _handle_location_file_move
    return move_files_fn(data['files'], origin_location, destination_location)
  File "/usr/lib/archivematica/storage-service/locations/api/resources.py", line 495, in move_files
    self._move_files_between_locations(files, origin_location, destination_location)
  File "/usr/lib/archivematica/storage-service/locations/api/resources.py", line 379, in _move_files_between_locations
    package=None
  File "/usr/lib/archivematica/storage-service/locations/models/space.py", line 293, in posix_move
    source_path, abs_destination_path, destination_space, package)
  File "/usr/lib/archivematica/storage-service/locations/models/local_filesystem.py", line 62, in posix_move
    return self.space.move_rsync(source_path, destination_path, try_mv_local=True)
  File "/usr/lib/archivematica/storage-service/locations/models/space.py", line 508, in move_rsync
    raise StorageException(s)
StorageException: Rsync failed with status 11: sending incremental file list
delta-transmission disabled for local transfer or --whole-file
<snip file listing>
rsync: write failed on "/var/archivematica/sharedDirectory/tmp/tmpd9MukX/largetransfer/somefile": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2]

Disk usage and file counts after the transfer fails but shows up in the dashboard for approval:

# for dir in /path/to/content/sourcedir /var/archivematica/sharedDirectory/watchedDirectories/activeTransfers/standardTransfer/largetransfer; do echo "$(du -sh "$dir") ($(find "$dir" -type f | grep -c .) files)"; done
14G     /path/to/content/sourcedir (13484 files)
116M    /var/archivematica/sharedDirectory/watchedDirectories/activeTransfers/standardTransfer/largetransfer (22 files)
sromkey commented 4 years ago

This seems worth re-testing after 1.11 is released given all of the performance improvements there- assuming that this problem doesn't persist, the PR can probably be closed.