archivematica / Issues

Issues repository for the Archivematica project
GNU Affero General Public License v3.0
16 stars 1 forks source link

Problem: AIP pointer file validation is failing #1292

Open tw4l opened 4 years ago

tw4l commented 4 years ago

Expected behaviour

AIP pointer files pass validation.

Current behaviour

All AIP pointer files are failing validation. The Storage Service debug logs are full of messages like:

/var/log/archivematica/storage-service/storage_service_debug.log:153406:ERROR     2020-08-27 10:45:51  locations.models.package:package:create_pointer_file:1450:  Pointer file constructed for 9e6c5762-ba8b-4bd1-86b3-8cc8758aa497 is not valid.

Looking more closely at the logs, we see:

archivematica-storage-service_1  | ERROR     2020-08-27 08:08:58  locations.models.package:package:create_pointer_file:1450:  Pointer file constructed for bb71cef4-41af-41d5-b995-2dac14323730 is not valid.
archivematica-storage-service_1  | Schematron Error(s):
archivematica-storage-service_1  | 1. A techMD mdWrap element must contain a PREMIS object element.
archivematica-storage-service_1  |    test: m:xmlData/p:object
archivematica-storage-service_1  |    location: /*[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='amdSec' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='techMD' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='mdWrap' and namespace-uri()='http://www.loc.gov/METS/']
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 2. A techMD mdWrap element MUST contain an XML schema location.
archivematica-storage-service_1  |    test: m:xmlData/p:object/@xsi:schemaLocation
archivematica-storage-service_1  |    location: /*[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='amdSec' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='techMD' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='mdWrap' and namespace-uri()='http://www.loc.gov/METS/']
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 3. A techMD mdWrap element MUST have an xsi:type attribute of file.
archivematica-storage-service_1  |    test: m:xmlData/p:object/@xsi:type = 'premis:file'
archivematica-storage-service_1  |    location: /*[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='amdSec' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='techMD' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='mdWrap' and namespace-uri()='http://www.loc.gov/METS/']
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 4. A digiprovMD mdWrap element MUST contain an XML schema location.
archivematica-storage-service_1  |    test: @MDTYPE = 'PREMIS:AGENT' or m:xmlData/p:*/@xsi:schemaLocation
archivematica-storage-service_1  |    location: /*[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='amdSec' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='digiprovMD' and namespace-uri()='http://www.loc.gov/METS/'][1]/*[local-name()='mdWrap' and namespace-uri()='http://www.loc.gov/METS/']
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 5. A digiprovMD mdWrap element MUST contain an XML schema location.
archivematica-storage-service_1  |    test: @MDTYPE = 'PREMIS:AGENT' or m:xmlData/p:*/@xsi:schemaLocation
archivematica-storage-service_1  |    location: /*[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='amdSec' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='digiprovMD' and namespace-uri()='http://www.loc.gov/METS/'][2]/*[local-name()='mdWrap' and namespace-uri()='http://www.loc.gov/METS/']
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 6. A PREMIS:EVENT must be represented by a PREMIS event element.
archivematica-storage-service_1  |    test: m:xmlData/p:event
archivematica-storage-service_1  |    location: /*[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='amdSec' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='digiprovMD' and namespace-uri()='http://www.loc.gov/METS/'][1]/*[local-name()='mdWrap' and namespace-uri()='http://www.loc.gov/METS/']
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 7. A PREMIS:EVENT must be represented by a PREMIS event element.
archivematica-storage-service_1  |    test: m:xmlData/p:event
archivematica-storage-service_1  |    location: /*[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='amdSec' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='digiprovMD' and namespace-uri()='http://www.loc.gov/METS/'][2]/*[local-name()='mdWrap' and namespace-uri()='http://www.loc.gov/METS/']
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 
archivematica-storage-service_1  | 
archivematica-storage-service_1  | XMLSchema (xsd) Error(s):
archivematica-storage-service_1  | 

xmllint output gets us a little closer to the problem:

➜ xmllint --schema mets.xsd pointer.e1d88fc7-18d8-4e9d-a59e-e758f114d6c0.xml --noout 
pointer.e1d88fc7-18d8-4e9d-a59e-e758f114d6c0.xml:8: element object: Schemas validity error : Element '{http://www.loc.gov/premis/v3}object', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The QName value '{http://www.loc.gov/premis/v3}file' of the xsi:type attribute does not resolve to a type definition.
pointer.e1d88fc7-18d8-4e9d-a59e-e758f114d6c0.xml:8: element object: Schemas validity error : Element '{http://www.loc.gov/premis/v3}object': The type definition is absent.
pointer.e1d88fc7-18d8-4e9d-a59e-e758f114d6c0.xml fails to validate

This does not result in the ingest failing, as the results aren't passed to MCP Server. The AIP is stored successfully and the pointer file is created and appears otherwise fine.

Steps to reproduce

Your environment (version of Archivematica, operating system, other relevant details)

qa/1.x / qa/0.x pre-1.12 release

Additional details

This isn't entirely new - I see the same error message in https://github.com/archivematica/Issues/issues/380, where it was noted as something needing investigation but was tangential to the main problem described there.


For Artefactual use:

Before you close this issue, you must check off the following:

replaceafill commented 1 year ago

I investigated this today and realized it's caused by the mets-reader-writer's schematron files (for METS and pointer file validation) still using PREMIS 2.2 when the Storage Service uses PREMIS 3.0.