DDMAL / musiclibs

:guitar: Searching IIIF Manifests
Other
6 stars 2 forks source link

IIIF validation. #4

Closed agpar closed 8 years ago

agpar commented 9 years ago

The presentation validator distributed by IIIF does not run on Python3. On their website they encourage people to send manifests to

http://iiif.io/api/presentation/validator/service/validate?format=json&version=2.0&url=[manifest-url-here]

And have them validate for you. This returns a json object with results.

{
url: "http://www.e-codices.unifr.ch/metadata/iiif/kba-0003/manifest.json",
error: "None",
okay: 1,
warnings: [
"URL does not have correct content-type header: got "text/html", expected JSON",
"WARNING: Setting non-standard field 'see_also' on resource of type 'sc:Manifest' "
]
}

Notice it caught the very two things we were worried about in #3. That's extremely encouraging.

Any moral/technical objections to me validating all the stuff on their servers? I'd like to avoid downgrading to python 2 or porting their validator to python 3 (at least in the short term). I think porting their validator to python 3 would be an admirable side-goal of this project, in fact I'd like to do it, but Id also like to put more meat on this project's bone before getting distracted.

Thoughts?

ahankinson commented 9 years ago

I would rather see you spend some time to get it working on Py3. It shouldn't be too hard. They've been having problems with keeping their validator running, and I don't really want to dump lots of manifests on them.

agpar commented 9 years ago

OK - but I'd like to keep using it while developin' so I can get an API up for Will to develop on ASAP-ly.

wabain commented 9 years ago

Validation doesn't need to be a priority for me as long as I can manually get a manuscript indexed. Really, just skipping validation would be fine for short-term development as far as I'm concerned.

agpar commented 8 years ago

As of 5e5211dcb4ddb532e8ee886c841302f493180d1a there's validation happening on our end, and it seems to work (there are not very good tests available for the files I took from the IIIF people).

There might be quite a few bugs in it. Also, I think it should be modified to attempt minor corrections when importing. For instance, http://manifests.ydc2.yale.edu/manifest/Admont23 will fail because its 'height' and 'width' values are strings instead of ints. This is definitely an error on their part, but can easily be checked and possibly corrected on our end.

ahankinson commented 8 years ago

The purpose of a validator is to validate -- I don't think it should attempt a correction, since that's not just validation.

I'm going to write an e-mail to the IIIF group about validation and buggy manifests.

agpar commented 8 years ago

Speaking of validation, I'm looking at some of the work I did on a json schema for the IIIF presentation API, and I'm starting to wonder if I'm missing the forest for trees here.

In terms of Misirlou's goals, isn't the only difference between a valid and invalid manifest the fact that we can index and display the former but not the latter? Why should we care about how closely a manifest adheres to the API? Based on my experience, nearly everyone is breaking some rule of the API, but we can generally index and display them anyway.

So, do we care about this? Do we need to validate against the syntax of the presentation API? It seems like the only things we really care about is that the document has type: scManifest, the context points to the IIIF presentation API, and we are capable of finding and displaying images. Everything else is just nitpicking.

wabain commented 8 years ago

The front end is probably assuming other things as well (the existence of labels, etc.) I can't think what all they would be off-hand. Some may be quite subtle.

agpar commented 8 years ago

We could write a validator that essentially checks our front end can pull as many pages from the manifest as it says their should be...

wabain commented 8 years ago

The kind of thing I'm worried about is bugs where I expect a string but get a number or an array or something. I don't think we can reliably find those bugs without validating against the spec. (We could also just add a a ton of defensive code to the front end, but I think that would be less maintainable.)

agpar commented 8 years ago

Yeah, that makes sense.

agpar commented 8 years ago

I've written a schema validator using a library very similar to JSON schema validation on the new-validator branch. It only validates manifests, but it should be checking for the correct types and structure.

There might be some holes in the net now, but we'll have to start importing stuff and see.

ahankinson commented 8 years ago

How far are we from importing stuff?

agpar commented 8 years ago

We're where we were before this commit. It doesn't really move us in any direction, beyond no longer needing to rely on the validator supplied by the IIIF people (which was extremely large and was prompting an extra HTTP GET).

agpar commented 8 years ago

Maybe we should close this issue and submit new issues as problems with the new validator emerge.

wabain commented 8 years ago

This looks really good! Question from a quick look (which probably doesn't deserve its own issue): is service repeatable? It should be, right? I think the same is true of @context.

ahankinson commented 8 years ago

I don't think service is repeatable... can you give me an example where it might be?

wabain commented 8 years ago

The spec says that "Any resource may have one or more links to an external service." I can't think of any obvious use cases, but search + something, or physical dimensions + something seems plausible.

agpar commented 8 years ago

Sorry I missed this @wabain. I don't know if services were repeatable when you asked this, but they certainly are now.

def service(value):
    """Validate against Service sub-schema."""
    if isinstance(value, str):
        uri(value)
    elif isinstance(value, list):
        for val in value:
            service(val)
    else:
        return _service_sub(value)