Closed agpar closed 8 years ago
I would rather see you spend some time to get it working on Py3. It shouldn't be too hard. They've been having problems with keeping their validator running, and I don't really want to dump lots of manifests on them.
OK - but I'd like to keep using it while developin' so I can get an API up for Will to develop on ASAP-ly.
Validation doesn't need to be a priority for me as long as I can manually get a manuscript indexed. Really, just skipping validation would be fine for short-term development as far as I'm concerned.
As of 5e5211dcb4ddb532e8ee886c841302f493180d1a there's validation happening on our end, and it seems to work (there are not very good tests available for the files I took from the IIIF people).
There might be quite a few bugs in it. Also, I think it should be modified to attempt minor corrections when importing. For instance, http://manifests.ydc2.yale.edu/manifest/Admont23 will fail because its 'height' and 'width' values are strings instead of ints. This is definitely an error on their part, but can easily be checked and possibly corrected on our end.
The purpose of a validator is to validate -- I don't think it should attempt a correction, since that's not just validation.
I'm going to write an e-mail to the IIIF group about validation and buggy manifests.
Speaking of validation, I'm looking at some of the work I did on a json schema for the IIIF presentation API, and I'm starting to wonder if I'm missing the forest for trees here.
In terms of Misirlou's goals, isn't the only difference between a valid and invalid manifest the fact that we can index and display the former but not the latter? Why should we care about how closely a manifest adheres to the API? Based on my experience, nearly everyone is breaking some rule of the API, but we can generally index and display them anyway.
So, do we care about this? Do we need to validate against the syntax of the presentation API? It seems like the only things we really care about is that the document has type: scManifest
, the context points to the IIIF presentation API, and we are capable of finding and displaying images. Everything else is just nitpicking.
The front end is probably assuming other things as well (the existence of labels, etc.) I can't think what all they would be off-hand. Some may be quite subtle.
We could write a validator that essentially checks our front end can pull as many pages from the manifest as it says their should be...
The kind of thing I'm worried about is bugs where I expect a string but get a number or an array or something. I don't think we can reliably find those bugs without validating against the spec. (We could also just add a a ton of defensive code to the front end, but I think that would be less maintainable.)
Yeah, that makes sense.
I've written a schema validator using a library very similar to JSON schema validation on the new-validator
branch. It only validates manifests, but it should be checking for the correct types and structure.
There might be some holes in the net now, but we'll have to start importing stuff and see.
How far are we from importing stuff?
We're where we were before this commit. It doesn't really move us in any direction, beyond no longer needing to rely on the validator supplied by the IIIF people (which was extremely large and was prompting an extra HTTP GET).
Maybe we should close this issue and submit new issues as problems with the new validator emerge.
This looks really good! Question from a quick look (which probably doesn't deserve its own issue): is service
repeatable? It should be, right? I think the same is true of @context
.
I don't think service is repeatable... can you give me an example where it might be?
The spec says that "Any resource may have one or more links to an external service." I can't think of any obvious use cases, but search + something, or physical dimensions + something seems plausible.
Sorry I missed this @wabain. I don't know if services
were repeatable when you asked this, but they certainly are now.
def service(value):
"""Validate against Service sub-schema."""
if isinstance(value, str):
uri(value)
elif isinstance(value, list):
for val in value:
service(val)
else:
return _service_sub(value)
The presentation validator distributed by IIIF does not run on Python3. On their website they encourage people to send manifests to
http://iiif.io/api/presentation/validator/service/validate?format=json&version=2.0&url=[manifest-url-here]
And have them validate for you. This returns a json object with results.
Notice it caught the very two things we were worried about in #3. That's extremely encouraging.
Any moral/technical objections to me validating all the stuff on their servers? I'd like to avoid downgrading to python 2 or porting their validator to python 3 (at least in the short term). I think porting their validator to python 3 would be an admirable side-goal of this project, in fact I'd like to do it, but Id also like to put more meat on this project's bone before getting distracted.
Thoughts?