archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Arkivum integration broken for compressed AIPs #735

Open mjaddis opened 5 years ago

mjaddis commented 5 years ago

Expected behaviour If I create a compressed AIP then it should be possible to store it in an Arkivum Space

Current behaviour The Storage Service throws an error when the AIP is saved because arkivum.py can't find the pointer file:

ERROR     2019-06-10 02:18:59  locations.models.async_manager:async_manager:wrapper:122:  Task threw an error: Error reading file '/var/archivematica/storage_service/24e5/2ba5/efac/4901/889f/e6ca/58b3/d385/pointer.24e52ba5-efac-4901-889f-e6ca58b3d385.xml': failed to load external entity "/var/archivematica/storage_service/24e5/2ba5/efac/4901/889f/e6ca/58b3/d385/pointer.24e52ba5-efac-4901-889f-e6ca58b3d385.xml"
Traceback (most recent call last):
  File "/usr/lib/archivematica/storage-service/locations/models/async_manager.py", line 119, in wrapper
    value = task_fn(*args, **kwargs)
  File "/usr/lib/archivematica/storage-service/locations/api/resources.py", line 705, in task
    self._store_bundle(bundle)
  File "/usr/lib/archivematica/storage-service/locations/api/resources.py", line 679, in _store_bundle
    premis_agents=agents, aip_subtype=aip_subtype)
  File "/usr/lib/archivematica/storage-service/locations/models/package.py", line 610, in store_aip
    storage_effects, checksum = self._store_aip_to_uploaded(v, related_package_uuid)
  File "/usr/lib/archivematica/storage-service/locations/models/package.py", line 777, in _store_aip_to_uploaded
    package=self)
  File "/usr/lib/archivematica/storage-service/locations/models/space.py", line 406, in post_move_from_storage_service
    *args, **kwargs)
  File "/usr/lib/archivematica/storage-service/locations/models/arkivum.py", line 114, in post_move_from_storage_service
    root = etree.parse(package.full_pointer_file_path)
  File "src/lxml/lxml.etree.pyx", line 3427, in lxml.etree.parse (src/lxml/lxml.etree.c:81117)
  File "src/lxml/parser.pxi", line 1811, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:117848)
  File "src/lxml/parser.pxi", line 1837, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:118195)
  File "src/lxml/parser.pxi", line 1741, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:117107)
  File "src/lxml/parser.pxi", line 1138, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:111653)
  File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105109)
  File "src/lxml/parser.pxi", line 706, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:106817)
  File "src/lxml/parser.pxi", line 633, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:105628)

The problem arises because the pointer file hasn't been saved to disk before arkivum.py tries to access it. arkivum.py accesses the pointer file so it can get the checksum for the AIP file and use that in an API call to the Arkivum appliance. However, looking at the package.py code (store_aip method), the storage of the pointer file doesn't happen until after _store_aip_to_uploaded() has been called.

        v = self._store_aip_to_pending(origin_location, origin_path)
        storage_effects, checksum = self._store_aip_to_uploaded(v, related_package_uuid)
        self._store_aip_ensure_pointer_file(
           v, checksum, premis_events=premis_events,
           premis_agents=premis_agents, aip_subtype=aip_subtype)

The pointer file needs to be accessible to arkivum.py before arkivum.py tries to save the AIP. Alternatively, another approach is needed to get the AIP checksum to arkivum.py so it can use it without having to load up the pointer file.

Steps to reproduce

  1. Create a new Space using the Arkivum protocol. Using a local directory on the SS is sufficient to recreate the problem above - you don't need a real Arkivum appliance.
  2. Create a new AIP storage location using the Arkivum Space.
  3. Create and store a compressed AIP to the new storage location.

Your environment (version of Archivematica, OS version, etc) We see this problem on AM 1.8.1, but it may be that the issue arose in 1.7. The issue doesn't arise in AM1.6


For Artefactual use: Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

ross-spencer commented 5 years ago

@mjaddis is there any chance you can test on 1.9.1? There were some pointer file creation issues noticed there that as you note, may have been around since 1.7. If you follow the links through this PR you'll find some more info about that and its diagnosis.

mjaddis commented 5 years ago

Hi @ross-spencer I've just tried 1.9.1 and get same result. I saw artefactual/archivematica-storage-service#443 , but I'm not sure that's the same problem. 443 seems to be about the SS thinking that no pointer file needs to be created because it can't see the AIP and test whether it's a file or not when it's still in the Dashboard? But the problem I have is that the SS does correctly think a pointer file is needed, but it doesn't generate it at the right time. I checked this with some extra logging that confirms that should_have_pointer is true, already_generated_ptr_exists is false, and pointer_file_dst is set.

ross-spencer commented 5 years ago

Thanks @mjaddis that helps! :+1: