artefactual-labs / mets-reader-writer

Library to parse and create METS files, especially for Archivematica.
https://mets-reader-writer.readthedocs.io
GNU Affero General Public License v3.0
20 stars 13 forks source link

METS is invalid according to XMLstarlet due to PREMIS - How do ye validate? #91

Open kieranjol opened 3 years ago

kieranjol commented 3 years ago

Hi, SUMMARY: When I validate the output of Archivematica METS againest the mets.xsd schema, it says that it's invalid. When I create a custom XSD that references both METS and PREMIS schemas, then all is well. How do ye validate your XML as part of your dev process?

ISSUE: This particularly seems to relate to PREMIS:TYPE definitions, and when I remove some of the extra namespace info for the PREMIS data, it validates just fine. Perhaps xmlstarlet isn't the best for this type of operation?

How to replicate: I took a METS XML file from the current archivematica sandbox , and I uploaded it here: https://gist.github.com/kieranjol/43f3d977306e3740daefaa284cc2d565 I validated it with the METS XSD from here: https://www.loc.gov/standards/mets/mets.xsd and the result is at the end of this issue. However eventually I found this from the PREMISv2 days, and it appears to be a similar issue: https://stackoverflow.com/questions/26712645/xml-type-definition-is-absent

I edited the example in the answer and created a new xsd which contains the following, and that validated your XML output just fine.


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
           elementFormDefault="qualified"> 

  <xs:import namespace="http://www.loc.gov/METS/"
     schemaLocation="http://www.loc.gov/standards/mets/mets.xsd"
  />

  <xs:import namespace="http://www.loc.gov/premis/v3"
    schemaLocation="http://www.loc.gov/standards/premis/v3/premis.xsd"
  />
</xs:schema>

And here's the error I got when validating archivematica METS against the original mets.xsd

xml val -e -s mets.xsd  ..\Downloads\METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:7.72: Element '{http://www.loc.gov/premis/v3}object', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The QName value '{http://www.loc.gov/premis/v3}intellectualEntity' of the xsi:type attribute does not resolve to a type definition.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:7.72: Element '{http://www.loc.gov/premis/v3}object': The type definition is absent.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:21.74: Element '{http://www.loc.gov/premis/v3}object', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The QName value '{http://www.loc.gov/premis/v3}file' of the xsi:type attribute does not resolve to a type definition.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:21.74: Element '{http://www.loc.gov/premis/v3}object': The type definition is absent.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:549.74: Element '{http://www.loc.gov/premis/v3}object', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The QName value '{http://www.loc.gov/premis/v3}file' of the xsi:type attribute does not resolve to a type definition.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:549.74: Element '{http://www.loc.gov/premis/v3}object': The type definition is absent.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:744.74: Element '{http://www.loc.gov/premis/v3}object', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The QName value '{http://www.loc.gov/premis/v3}file' of the xsi:type attribute does not resolve to a type definition.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:744.74: Element '{http://www.loc.gov/premis/v3}object': The type definition is absent.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:1105.74: Element '{http://www.loc.gov/premis/v3}object', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The QName value '{http://www.loc.gov/premis/v3}file' of the xsi:type attribute does not resolve to a type definition.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:1105.74: Element '{http://www.loc.gov/premis/v3}object': The type definition is absent.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:1370.74: Element '{http://www.loc.gov/premis/v3}object', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The QName value '{http://www.loc.gov/premis/v3}file' of the xsi:type attribute does not resolve to a type definition.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:1370.74: Element '{http://www.loc.gov/premis/v3}object': The type definition is absent.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:1635.74: Element '{http://www.loc.gov/premis/v3}object', attribute '{http://www.w3.org/2001/XMLSchema-instance}type': The QName value '{http://www.loc.gov/premis/v3}file' of the xsi:type attribute does not resolve to a type definition.
../Downloads/METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml:1635.74: Element '{http://www.loc.gov/premis/v3}object': The type definition is absent.
..\Downloads\METS.56006c7d-77ce-462a-b20f-35650ca66e52.xml - invalid
kieranjol commented 3 years ago

OK, so i see the SO link referenced here as well - https://github.com/artefactual-labs/mets-reader-writer/blob/master/metsrw/validate.py - seems like the long and short of it is that ye are aware of this validation quirk and created workarounds as a result?

WhenSkiesAbove commented 5 months ago

I'm going to bump this, as I'm curious what the response/solution is. Actual issue? Falsely flagged as invalid? Known issue being worked on?

kieranjol commented 5 months ago

I totally forgot writing this but i appreciate the bump and echo those questions :)

On Thu 30 May 2024 at 21:50, Jin @.***> wrote:

I'm going to bump this, as I'm curious what the response/solution is. Actual issue? Falsely flagged as invalid? Known issue being worked on?

— Reply to this email directly, view it on GitHub https://github.com/artefactual-labs/mets-reader-writer/issues/91#issuecomment-2140843750, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAITFPQYUHRIPAMSYWDLATLZE6GJTAVCNFSM5DQMJ3JKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJUGA4DIMZXGUYA . You are receiving this because you authored the thread.Message ID: @.***>