digital-preservation / csv-schema

CSV Schema
http://digital-preservation.github.io/csv-schema
Mozilla Public License 2.0
98 stars 33 forks source link

How does integrityCheck know which folders to check? #34

Open logicplace opened 3 years ago

logicplace commented 3 years ago

I'm struggling a bit to understand how to implement integrityCheck, I was looking at the examples but the test cases were too well-formed to really explain it to me.

For instance, if you have a CSV like

filepath,foo
file:///C:/a/content/b.txt,bar
file:///C:/a/content/c.txt,baz
file:///C:/b/content/a.png,boo

Are both C:\a\content and C:\b\content checked? If "content" (or whichever subfolder was supplied) wasn't the last folder in the path, would that cause a schema validation error?

Also, if relative paths are used:

filepath,foo
b.txt,bar
c.txt,baz
a.png,boo

would filepath: integrityCheck("excludeFolder") (or includeFolder I guess?) check in %cd%\content? Is a prefix required?

DavidUnderdown commented 3 years ago

It is a bit tricky, and is one area where there is a bit more of an implicit assumption that people are following our practice for folder structures. I need to think a bit about it myself - I think in the first example, apart from anything else, you would need to specify excludeFolder as you do not have an explicit line that specifies the parent folder itself (ie a line with filepath file:///C:/a/ and file:///C:/b )

logicplace commented 3 years ago

Oh okay, I think I misunderstood that argument then. So when using includeFolder is it accurate to say that all folders which should be checked are explicitly listed in that column in the csv? And similarly, that all folders explicitly listed there should be checked? (Is it a schema error if such folders don't have the last folder as "content"?)

If that's the case, then I suppose the only confusion is the case of excludeFolder.