E-ARK-Software / eark-validator

E-ARK Python Information Package validation library
Apache License 2.0
5 stars 3 forks source link

Incorrect validation of requirement CSIP100 #100

Closed dockmd closed 2 weeks ago

dockmd commented 2 weeks ago

Test case: https://github.com/DILCISBoard/eark-ip-test-corpus/tree/integration/corpus/CSIP/CSIP100/testCase.xml contains the definition of 1 packages which should be invalid but the validator says they are valid. Valid according to the validator, but should be invalid: Package: https://github.com/DILCISBoard/eark-ip-test-corpus/tree/integration/corpus/CSIP/CSIP100/invalid/structMap_does_not_point_at_Schemas Output: struct result is: WellFormed {"uid":"871dfbec8f7f49e299c9f4f06bc16404","structure":{"status":"WellFormed","messages":[{"rule_id":"CSIPSTR3","severity":"Info","location":"root structMap_does_not_point_at_Schemas","message":"The Information Package MAY be contained in an archive/compressed form, e.g. TAR or ZIP, for storage or transfer. The specific format details should be decided by the interested parties and documented, for example in a submission agreement or statement of access terms."},{"rule_id":"CSIPSTR5","severity":"Warn","location":"root structMap_does_not_point_at_Schemas","message":"The Information Package root folder SHOULD include a folder named metadata, which SHOULD include metadata relevant to the whole package."},{"rule_id":"CSIPSTR12","severity":"Warn","location":"rep1 representation","message":"The representation folder SHOULD include a metadata file named METS.xml which includes information about the identity and structure of the representation and its components. The recommended best practice is to always have a METS.xml in the representation folder."},{"rule_id":"CSIPSTR13","severity":"Warn","location":"rep1 representation","message":"The representation folder SHOULD include a sub-folder named metadata which MAY include all metadata about the specific representation."}]},"metadata":{"schema_results":{"status":"VALID","messages":[]},"schematron_results":{"status":"INVALID","messages":[{"rule_id":"CSIP4","severity":"Error","location":"/mets:mets((@csip:CONTENTINFORMATIONTYPE = 'ERMS') or (@csip:CONTENTINFORMATIONTYPE = 'SIARD1') or (@csip:CONTENTINFORMATIONTYPE = 'SIARD2') or (@csip:CONTENTINFORMATIONTYPE = 'SIARDDK') or (@csip:CONTENTINFORMATIONTYPE = 'GeoData') or (@csip:CONTENTINFORMATIONTYPE = 'citscarchival_v1_0') or (@csip:CONTENTINFORMATIONTYPE = 'cscarchival_v1_0') or (@csip:CONTENTINFORMATIONTYPE = 'citserms_v2_1') or (@csip:CONTENTINFORMATIONTYPE = 'citserms_v3_0') or (@csip:CONTENTINFORMATIONTYPE = 'citspremis_v1_0') or (@csip:CONTENTINFORMATIONTYPE = 'cspremis_v1_0') or (@csip:CONTENTINFORMATIONTYPE = 'citsehpj_v1_0') or (@csip:CONTENTINFORMATIONTYPE = 'citsehpj_v2_0') or (@csip:CONTENTINFORMATIONTYPE = 'citsehcr_v1_0') or (@csip:CONTENTINFORMATIONTYPE = 'citssiard_v1_0') or (@csip:CONTENTINFORMATIONTYPE = 'citsgeospatial_v3_0') or (@csip:CONTENTINFORMATIONTYPE = 'MIXED') or (@csip:CONTENTINFORMATIONTYPE = 'OTHER')) and (@csip:CONTENTINFORMATIONTYPE != 'OTHER' or (@csip:CONTENTINFORMATIONTYPE = 'OTHER' and @csip:OTHERCONTENTINFORMATIONTYPE != ''))/[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']","message":"Used to declare the Content Information Type Specification used when creating the package. Legal values are defined in a fixed vocabulary. The attribute is mandatory for representation level METS documents."},{"rule_id":"CSIP5","severity":"Error","location":"/mets:mets(@csip:CONTENTINFORMATIONTYPE = 'OTHER' and @csip:OTHERCONTENTINFORMATIONTYPE) or @csip:CONTENTINFORMATIONTYPE != 'OTHER'/[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']","message":"When the mets/@csip:CONTENTINFORMATIONTYPE has the value “OTHER” the mets/@csip:OTHERCONTENTINFORMATIONTYPE must state the content information type."},{"rule_id":"CSIP17","severity":"Warn","location":"/mets:metsmets:dmdSec/[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']","message":"Must be used if descriptive metadata about the package content is available. NOTE: According to official METS documentation each metadata section must describe one and only one set of metadata. As such, if implementers want to include multiple occurrences of descriptive metadata into the package this must be done by repeating the whole dmdSec element for each individual metadata."},{"rule_id":"CSIP31","severity":"Warn","location":"/mets:metsmets:amdSec/[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']","message":"If administrative / preservation metadata is available, it must be described using the administrative metadata section (amdSec) element. All administrative metadata is present in a single amdSec element."},{"rule_id":"CSIP8","severity":"Warn","location":"/mets:mets/mets:metsHdr@LASTMODDATE/[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/[local-name()='metsHdr' and namespace-uri()='http://www.loc.gov/METS/']","message":"The metsHdr element SHOULD have a LASTMODDATE attribute."},{"rule_id":"CSIP114","severity":"Error","location":"/mets:mets/mets:fileSecmets:fileGrp[@USE = 'Representations']/[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/[local-name()='fileSec' and namespace-uri()='http://www.loc.gov/METS/']","message":"A pointer to the METS document describing the representation or pointers to the content being transferred must be present in one or more file groups with mets/fileSec/fileGrp/@USE attribute value “Representations”."},{"rule_id":"CSIP105","severity":"Warn","location":"/mets:mets/mets:structMap[@LABEL = 'CSIP']/mets:divmets:div[@LABEL = 'Representations']/mets:div/[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/[local-name()='structMap' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='div' and namespace-uri()='http://www.loc.gov/METS/']","message":"When a package consists of multiple representations, each described by a representation level METS.xml document, there should be a discrete representation div element for each representation."},{"rule_id":"CSIP91","severity":"Warn","location":"/mets:mets/mets:structMap[@LABEL = 'CSIP']/mets:div/mets:div[@LABEL = 'Metadata']@ADMID/[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/[local-name()='structMap' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='div' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='div' and namespace-uri()='http://www.loc.gov/METS/'][1]","message":"When there is administrative metadata and the amdSec is present, all administrative metadata MUST be referenced via the administrative sections different identifiers."},{"rule_id":"CSIP92","severity":"Warn","location":"/mets:mets/mets:structMap[@LABEL = 'CSIP']/mets:div/mets:div[@LABEL = 'Metadata']@DMDID/[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/[local-name()='structMap' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='div' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='div' and namespace-uri()='http://www.loc.gov/METS/'][1]","message":"When there are descriptive metadata and one or more dmdSec is present, all descriptive metadata MUST be referenced via the descriptive section identifiers."},{"rule_id":"SIP2","severity":"Error","location":"/mets:mets@PROFILE = 'https://earksip.dilcis.eu/profile/E-ARK-SIP.xml'/*[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']","message":"The PROFILE attribute MUST contain the URL of the METS profile, for a SIP: https://earksip.dilcis.eu/profile/E-ARK-SIP.xml."},{"rule_id":"SIP14","severity":"Error","location":"/mets:mets/mets:metsHdr/mets:agent[@ROLE = 'CREATOR']/mets:note@NOTETYPE = 'IDENTIFICATIONCODE'/[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/[local-name()='metsHdr' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='agent' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='note' and namespace-uri()='http://www.loc.gov/METS/']","message":"The creator agent element MUST have a NOTETYPE attribute of value IDENTIFICATIONCODE."}]}},"package":{"mets":{"root":{"namespaces":{"":"http://www.loc.gov/METS/","csip":"https://DILCIS.eu/XML/METS/CSIPExtensionMETS","xsi":"http://www.w3.org/2001/XMLSchema-instance","xlink":"http://www.w3.org/1999/xlink"},"objid":"structMap_does_not_point_at_Schemas","label":"","type":"Mixed","profile":"https://earkcsip.dilcis.eu/profile/E-ARK-CSIP.xml"},"file_entries":[{"path":"documentation/Doc1.txt","type":"file","size":"40","checksum":{"algorithm":"MD5","value":"F57DBBDDF87F18043C2029D978749318"},"mimetype":"text/plain","isValid":true,"errors":[]},{"path":"schemas/DILCISExtensionMETS.xsd","type":"file","size":"1633","checksum":{"algorithm":"MD5","value":"E99C19B9CA1271C1D9BAFED19C4BD50A"},"mimetype":"application/xml","isValid":true,"errors":[]},{"path":"schemas/METS.xsd","type":"file","size":"136472","checksum":{"algorithm":"MD5","value":"D303B7A71BA2B4FF0061BDCBA0F152E0"},"mimetype":"application/xml","isValid":true,"errors":[]},{"path":"schemas/xlink.xsd","type":"file","size":"3180","checksum":{"algorithm":"MD5","value":"6BDC7F9459A502964F889D70A335CECE"},"mimetype":"application/xml","isValid":true,"errors":[]},{"path":"representations/rep1/data/plain_text_document.txt","type":"file","size":"12","checksum":{"algorithm":"MD5","value":"A9308BDE501CFD1D91CE4E5E861C8971"},"mimetype":"text/plain","isValid":true,"errors":[]}]},"details":{"name":"structMap_does_not_point_at_Schemas","label":"","oaispackagetype":"SIP","othertype":"","contentinformationtype":"","checksums":[]},"representations":[]}}