IIIF / discovery

8 stars 3 forks source link

Support for Non IIIF resources #82

Closed glenrobson closed 3 years ago

glenrobson commented 3 years ago

This use case came up from a discussion in the IIIF training course last week.

I am a content aggregator and want to switch away from OAI-PMH to IIIF discovery to harvest digital resources. As well as IIIF Images, Audio and Video I also harvest PDFs and Word documents. If I am to move away from OAI-PMH I need the discovery solution to encompass both IIIF resources and non IIIF resources.

I believe its possible to reference activities for non-IIIF resources but how far do we go to support this use case? Do we include non-IIIF examples in the Spec or is this a recipe?

azaroth42 commented 3 years ago

It's a IIIF Specification about making IIIF resources discoverable. I feel pretty strongly that the spec should not go into those details, which will be many, varied and sometimes unsolvable. PMH also does not support PDFs or word documents, or indeed anything other than metadata records in XML, so the described use case is somewhat dubious.

A recipe, note or other document describing a non-normative way to do non IIIF resources would be fine... but for 1.0 we need to demonstrate implementations, so tying 1.0 to non IIIF resources would be decidedly problematic.

glenrobson commented 3 years ago

A recipe, note or other document describing a non-normative way to do non IIIF resources would be fine... but for 1.0 we need to demonstrate implementations, so tying 1.0 to non IIIF resources would be decidedly problematic.

For 1.0 it would be good to have an agreed example of what this would look like either in a published recipe (probably unlikely with the timescale of 1.0) or just a recipe issue so I can point to an example when persuading people to make the switch.

PMH also does not support PDFs or word documents, or indeed anything other than metadata records in XML, so the described use case is somewhat dubious.

It's not unheard of for XML metadata records to point to PDF and Word documents :-).

aisaac commented 3 years ago

I am not sure I understand "just a recipe issue so I can point to an example when persuading people to make the switch." Are you saying that there are organizations interested in the Change API, but they would like to apply it to non-IIIF resources and the solution would be to persuade them to change to IIIF? I'm not sure such a general advocacy should be in scope for this TSG. Especially if we have to persuade organizations who host Word documents :-)

zimeon commented 3 years ago

PMH also does not support PDFs or word documents, or indeed anything other than metadata records in XML, so the described use case is somewhat dubious.

It's not unheard of for XML metadata records to point to PDF and Word documents :-).

I agree that OAI-PMH records may point to resources rather than XML metadata but you get into various complications with what dates apply to what. See, for example, http://www.dlib.org/dlib/december04/vandesompel/12vandesompel.html for a discussion of resource harvesting using OAI-PMH. OAI-PMH is outdated (see e.g. https://www.slideshare.net/simeonwarner/mind-the-gap-77336241) and Activity Streams properly solves the problem of which dates and types apply to which objects and documents through the indirection of Activity objects and, as necessary, Link objects.

Since IIIF Discovery is a profile of Activity Streams it seems that the straightforward answer to "how do I expose non-IIIF resources in a compatible way?" is "use Activity Streams". The current specification allows for a mixed or parallel streams by saying that object types SHOULD be Manifest or Collection, they could also be type="Document" or whatever else following the patterns given in Activity Streams, such as Example 111.

I agree with Rob's https://github.com/IIIF/discovery/issues/82#issuecomment-740162602 that a note or recipe explaining this would be useful (for data providers to see what to do, and for aggregators to know what to ignore if they are looking only for IIIF resources). I also agree that it shouldn't be in the specification because that would be a significant expansion of scope and would dilute the key message.

azaroth42 commented 3 years ago

It's not unheard of for XML metadata records to point to PDF and Word documents :-)

Indeed (and see Simeon's response to why that's not a great idea) but it's also not unheard of for IIIF Manifests to refer to PDFs and other content documents ... so they should just implement IIIF, thereby solving their problem in the same way as they use PMH, and without us having to describe how to implement Discovery for non-IIIF resources.

azaroth42 commented 3 years ago

Recipe proposal from 2021-01-06 call:

Spec:

Registry:

azaroth42 commented 3 years ago

Related to #12

glenrobson commented 3 years ago

Here is an example containing a link to a PDF document and a link to a IIIF Image (without a manifest) for discussion:

https://glenrobson.github.io/iiif_stuff/activities/non-iiif.json

Questions this raises is:

kirschbombe commented 3 years ago

PDF MIME type is application/pdf?

azaroth42 commented 3 years ago

Call of 2021-02-03: Considered done for the spec. Close when merged. Move recipe issue to cookbook repo

glenrobson commented 3 years ago

Done the following changes to the example (that will be added to the new recipe issue):

For PDF example: