KOST-CECO / KOST-Val

The KOST-Val application is used for validate files and Submission Information Package (SIP).
http://coptr.digipres.org/KOST-Val
Other
24 stars 6 forks source link

SIARD: Wrong "Invalid" due to "additional primary files #58

Closed stighemmer closed 3 years ago

stighemmer commented 3 years ago

When a SIARD file contains a Large Object, KOST-Val will complain that

J) additional primary files The following entries are not referenced in the SIARD file: [C:\Users\sh1u.kost-val_2x\temp_KOST-Val\SIARD\content\schema0\table0\lob1, C:\Users\sh1u.kost-val_2x\temp_KOST-Val\SIARD\content\schema0\table0\lob1\seg0, C:\Users\sh1u.kost-val_2x\temp_KOST-Val\SIARD\content\schema0\table0\lob1\seg0\rec1.txt].xml: "{1}"

even though it is clearly referenced. Also, the error message seems to be broken.

Test cases: (Github doesn't like SIARD files, so I have renamed them .ZIP) TestDB-without.zip TestDB-with.zip Resulting log file: (Github doesn't like XML files, so I have removed the file ending.) TestDB.kost-val.log

The first test case is a file created by Spectral Core's Full Convert. As I read the standard, this file is incorrect in that its "lobFolder" elements does not contain trailing directory slashes. This should not validate.

I have corrected this in the second test case. From the corrected header\metadata.xml

<siardArchive> ...
  <lobFolder>content/</lobFolder>
  ...<column>
        <lobFolder>schema0/table0/lob1/</lobFolder>
     </column>
</siardArchive>

and table0.xml

...
  <c1 file="seg0/rec1.txt"/>
...

As I read the SIARD 2.1 (or 2.1.1) standard, this second test case should validate.

Chlara commented 3 years ago

Thanks for the Issue

I have added the validation on relative URI in the new version 2.0.5 (prerelease). Can you check it. now both (with and without) should be valid.

https://github.com/KOST-CECO/KOST-Val/releases/tag/v2.0.5

stighemmer commented 3 years ago

Version 2.0.5 Pre tested and found to validate both files.

Technically, it should have complained about TestDB-without, but I don't care.
As far as I am concerned, this issue is closed.

For the curious: (I repeat, this is NOT AN ISSUE) If you unify URIs "/a/b/c" and "d/e", the result should be "/a/b/d/e". If you unify URIs "/a/b/c/" and "d/e", the result should be "/a/b/c/d/e".