DILCISBoard / E-ARK-SIP

E-ARK SIP specification
https://earksip.dilcis.eu/
Creative Commons Attribution 4.0 International
7 stars 6 forks source link

Check that the example appendix in the METS profile is correct and follows the latest version. #99

Closed jmaferreira closed 2 years ago

jmaferreira commented 3 years ago

Several questions have been asked:

For SIP validation we use the commons-ip library available on git. There are principaly two versions of this library 1.0.3 and 2.0.0-alpha1 and we noticed in the code the xsd schema for the structural validation of xml is hard-coded. But we did not understand the elements that make it possible to distinguish which version to use (at the level of the METS.xml file). Likewise, the xsd file used for validation is hard-coded ("/schemas/mets1_11.xsd" or "/schemas2/mets1_12.xsd"). After reading some documentations there are several versions for METS.xml (although some are quite old) is there a way to configure the xsd to use?

Without changing the code and recompiling it is not possible to change the schema. For more information I suggest placing an issue on https://github.com/keeps/commons-ip

We have retrieved the various SIPs stored on different E-ARK projects on git and generally (but not always) we can validate them with version 1.0.3 or 2.0.0-alpha1 (probably because the use the same library to generate them) but when we try to validate them with other tools it does not work. Example: we tried to use the online application "https://eark.openpreservation.org/" but we never got validation that works.

Commons-IP is undergoing a major update and a new validation module is being developed as we speak. The alfa version was still a long way to being finished. Could you please check again using the latest version available at https://github.com/keeps/commons-ip/releases

Please make sure you are experimenting with the correct versions of packages. For example, EARK v1 packages will not be valid according to the EARK v2 specification.

With these first tests we are not very confident about the validation process. Do you have any advice for us regarding the validation process ? Here is an example of a check done in the commons-ip 1.0.3 library In EARKUtils class there is a check on LABEL attribute of structMap tag (method getEARKStructMap) that has to be either "Common Specification structural map" or "E-ARK structural map". That's why in our case a validation failed (perhaps there are other issues). Just for information I have attached a SIP to this mail (generated with a webapp "https://earkweb.sydarkivera.se/earkweb/submission/overview") and validation fail with commons-ip library.

The structMap/@LABEL as "Common Specification structural map" was the value used on version 1 of the common specification (see https://dilcis.eu/images/Specifications/CS/Common_Specifications_for_IPs_v10.pdf).

Version 2 uses a different vocabulary. The @LABEL is expected to be "CSIP" instead of "Common Specification structural map". The SIP specification does not make changes to the inherited vocabulary as described in the specification:

I did not have access to the SIP created with EARK WEB so I can't comment on the reasons why it is not valid.

Are there any plans to verify EIDAS or other digital signature formats ?

Not at the moment, as far as I know.

Are there plans to check the perennial formats (PDF, TIFF etc ...) ?

No. EARK only cares about the packaging, not validating content. For content validation there are other tools that you may use, e.g. the ones developed under the PREFORMA project - http://www.preforma-project.eu/open-source-portal.html

After checking the source code we see that it's not possible to deactivate the verification of fingerprints or generally to deactivate certain stages of the validation ? Are any such code enhancement planned in the roadmap ?

A new validation module is being developed under https://github.com/keeps/commons-ip. The final version should be released towards the end of October 2021.

The current state of the validator, validates all the requirements of the CSIP. There are no plans to disable certain requirements.

jmaferreira commented 2 years ago

@karinbredenberg The examples outline in this line of thought are all correct.