E-ARK-Software / eark-validator

E-ARK Python Information Package validation library
Apache License 2.0
5 stars 3 forks source link

Validating CSIP20 #51

Open dockmd opened 1 month ago

dockmd commented 1 month ago

There are 3 issues with validating the CSIP20 requirement:

This can be seen when testing rule 1, package "IP_18000_CSIP20_1" and rule 2, packages "IP_18000_CSIP20_2" and "IP_18000_CSIP20_3" from testCase.xml

Sunday-Crunk commented 1 month ago

I'm not certain about issue 1, see notes under the tests, but aside from that fixes look good

CSIP20 The validator throws validation error when 'Status' attribute is not used.

Test

eark-validator /eark-test-corpus/eark-ip-test-corpus/corpus/CSIP/CSIP20/valid/IP_18000_CSIP20_1

Result prepatch

Warning severity message reported.

{
        "rule_id": "CSIP20",
        "severity": "Warning",
        "location": {
          "context": "/mets:mets/mets:dmdSec",
          "test": "@STATUS",
          "description": "/*[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='dmdSec' and namespace-uri()='http://www.loc.gov/METS/']"
        },
        "message": "SHOULD be used to indicated the status of the package."
      }

Fix result

No rule for CSIP20

Notes:

I might be mistaken, but I believe the original behaviour was expected. The validator returns a "Warning" severity message for CSIP20 when no @STATUS attribute is present, which seems consistent with the requirement that the dmdSec "should" have a @STATUS attribute.

CSIP20 The validator doesn't throw validation error when 'Status' attribute has incorrect value (not from vocabulary).

Test

eark-validator /eark-test-corpus/eark-ip-test-corpus/corpus/CSIP/CSIP20/invalid/IP_18000_CSIP20_2

Result prepatch

Fail: No entry for CSIP20

Fix result

Pass: Rule reported correctly.

[
    {
        "rule_id": "CSIP20",
        "severity": "Error",
        "location": {
            "context": "/mets:mets/mets:dmdSec",
            "test": "not(@STATUS) or ((@STATUS = 'SUPERSEDED') or (@STATUS = 'CURRENT'))",
            "description": "/*[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='dmdSec' and namespace-uri()='http://www.loc.gov/METS/']"
        },
        "message": "SHOULD be used to indicated the status of the package."
    }
]

Test

eark-validator /eark-test-corpus/eark-ip-test-corpus/corpus/CSIP/CSIP20/valid/IP_18000_CSIP20_5

Result prepatch

Pass: No entry for CSIP20

Fix result

Pass: No entry for CSIP20

CSIP20 The validator doesn't throw validation error when 'Status' attribute value is lowercase.

Test

eark-validator /eark-test-corpus/eark-ip-test-corpus/corpus/CSIP/CSIP20/invalid/IP_18000_CSIP20_3

Result prepatch

Fail: No error entry for CSIP20

Fix result

Pass: rule reported correctly.

[
    {
        "rule_id": "CSIP20",
        "severity": "Error",
        "location": {
            "context": "/mets:mets/mets:dmdSec",
            "test": "not(@STATUS) or ((@STATUS = 'SUPERSEDED') or (@STATUS = 'CURRENT'))",
            "description": "/*[local-name()='mets' and namespace-uri()='http://www.loc.gov/METS/']/*[local-name()='dmdSec' and namespace-uri()='http://www.loc.gov/METS/']"
        },
        "message": "SHOULD be used to indicated the status of the package."
    }
]
dockmd commented 2 weeks ago

@Sunday-Crunk warning with proper severity should be generated now.