Closed barser closed 3 years ago
Hi @barser,
Most of the information in info.json is page-specific. Different pages may have different dimensions, tile sizes, etc. So, adding a pageCount
key wouldn't really be "correct."
I think it would be OK to add the page count to the delegate object context. You could then expose it in info.json using the extra_iiif_information_response_keys()
delegate method. I'd worry about clients becoming dependent on that, though.
For a more correct solution, maybe Cantaloupe could do something like what it does with scale constraints and recognize a special suffix on the identifier to indicate a page number, instead of using a page
query argument. That wouldn't give you the page count, but it would ensure that all of the pages' info.jsons are correct. The page count could be exposed as described above.
But what does a client do with pageCount
? It would need to know that the identifier can be manipulated in a certain way to get a different page. Maybe instead there should be links to all of the other pages in every page's info.json.
I would be interested to know if IIIF has offered guidance on this use case. I encourage you and others with the same use case to voice it to the designers of the Image API.
In IIIF 3.0 there is a partOf
linking property which I presume would link to a Presentation API manifest which contains all the pages of the TIF/PDF. I assume that is the direction this would take (?)
@adolski since we assume Cantaloupe deals with only the Image API, which is great and more than good enough. So for the PDF use case (and the cool fact we can request pages for PDFs via this server) we use pdfinfo
to extract the number of Pages upfront for the file served and store them side by side in our metadata. Using that we build our IIIF Presentation API Manifest, appending to the same file name the page number as argument. Its quite simple really, pretty sure most people using Cantaloupe's PDF capabilities do something similar.
Hi @adolski,
For a more correct solution, maybe Cantaloupe could ... recognize a special suffix on the identifier to indicate a page number, instead of using a
page
query argument
It would be very useful! Moreover, such compound identifier will not violate the IIIF Image API identifier specification unlike the page
GET parameter do.
If I place link https://{server}/iiif/2/{id}?page={n}
to the page n
of PDF with ID id
in the IIIF Presentation API manifest, then viewers like Mirador or UniversalViewer generate requests for metadata of page like GET https://{server}/iiif/2/{id}?page={n}/info.json
and for tiles of page like GET https://{server}/iiif/2/{id}?page={n}/full/,165/0/default.jpg
, which is completely wrong.
But if the link to the page n
of PDF with ID id
would be look like, for example, https://{server}/iiif/2/{id}_p{n}
or https://{server}/iiif/2/{id}_page_{n}
, then it will satisfy IIIF Image API and viewers will generate correct links.
Hi @DiegoPino,
... we build our IIIF Presentation API Manifest, appending to the same file name the page number as argument
Could you share example of your Presentation API manifest please? When I create manifest with page numbers as GET parameters, then viewers can't correctly process the links - see example in this post.
@mitring I'm not skilled as @DiegoPino is but probably our Archipelago presentation API works due to including into resource id the page number GET as this:
"canvases": [
{
"@id": "http://archipelago.byterfly.eu/node/29/iiif/b14b588e-c335-4df7-ae6d-3ba2a831c714/canvas/p1",
"@type": "sc:Canvas",
"label": "p. 1",
"width": 3,
"height": 4,
"images": [{
"@type": "oa:Annotation",
"motivation": "sc:painting",
"resource":{
"@id": "http://archipelago.byterfly.eu/iiif-server/iiif/2/9d8%2Fapplication-conf16-selectedpapers-11-ceregato-et-al-b14b588e-c335-4df7-ae6d-3ba2a831c714.pdf/full/full/0/default.jpg?page=1",
"@type": "dctypes:Image",
"format": "image/jpeg",
Hi @giancarlobi,
Thank you for your reply! But in your example you specify link to concrete full-size image in @id
attribute. I don't claim that it's wrong, but I want to specify link from which the Image API-compatible viewers could derive links to different tiles, possibly rotated or greyscaled. For example:
{
"@id": "http://{server}/iiif/2/2722/canvas/page09",
"@type": "sc:Canvas",
"label": "Page 9",
"width": 1240,
"height": 1754,
"images": [
{
"@id": "http://{server}/iiif/2/2722/annotation/page09",
"@type": "oa:Annotation",
"motivation": "sc:painting",
"resource": {
"@id": "http://{server}/iiif/2/2722?page=9",
"@type": "dctypes:Image",
"format": "image/jpeg",
"width": 1240,
"height": 1754,
"service": {
"@context": "http://iiif.io/api/image/2/context.json",
"@id": "http://{server}/iiif/2/2722?page=9",
"profile": "http://iiif.io/api/image/2/level2.json"
}
},
"on": "http://{server}/iiif/2/2722/canvas/page09"
}
]
}
And in such case page
parameter breaks the compatibility with IIIF Image API specification.
@mitring you are right, maybe @DiegoPino has some more notes to add. IMHO I think that your idea for a suffix style {id}page{n} would be very useful as also @adolski reported in his post.
@mitring sorry, late to the party here. Time zone difference!
As @giancarlobi correctly was saying, we do generate the IIIF manifests for PDFs via this property but as you have clearly detected too, API Specs on one side but also each pretty liberal interpretation of each on the client side (or viewers) make this quite complex to process. I can not remember where in the specs (if you can point me to it it would be great), it says that URL arguments are not allowed. I was in the impression that given the original nature of the manifests (in v 2.x it was clearly JSON-LD, now in 3.0 more a 'depends on you how to interpret it, pure json or not), anything @id
just needs to be a valid IRI. One problem was that a few versions ago (fixed now by @adolski!) GET arguments where not passed to the info.json, so page increments would not deliver a new size and actual image URL with every ?page argument change. That is fixed in the latest version in the 4.1.x series here.
For that reason and others (client, viewers, each one doing things differently) we decided to build different type of dynamic generated iiif manifests (v 3) depending on the need and in specific, the PDF one that uses the page arguments is serving images directly without a service definition to avoid this whole problem, so still spec compliant but yes, no black and white or rotation possible.
But, that said, we have another ongoing discussion with some boiler plate code that is specific to our needs but can be applied to any local solution really. An URL wrapper logic around this to make API client happy and of course also Cantaloupe. Here is the comment NOTE: see in the same issue discussion also how Mirador 3 has fixed the lack of static image support!
Basically we (i) have planned for a few proxy endpoint/URLS that wrap cantaloupe ones and do exactly what @adolski suggests, move arguments into IDs, and then locally those are split,processed and internal call to cantaloupe is made and the resulting JSON altered to get a correct, capable for your need, info.json.
I know it sounds like a hack, but on the other side gives (or said different would give when done) it allows us to have more control over this and other arguments we could need to pass into the ID.
I feel a good way of doing almost the same directly on cantaloupe if you don't want to have your own proxy pre processing of cantaloupe endpoints would be to allow in cantaloupe a request preprocessor using the same delegates system/ruby processing a way of processing/splitting ids before the actual call is made and allowing then from inside the delegate call then cantaloupe again. Not sure if i explain myself, like a pre handler for the request. That way the id and how its formed (with an extra ?page
or whatever you want) can be customized by each implementation and then routed back to a normal Cantaloupe endpoint (e.g with an ?page
at the end of the URI)
Side note: i feel there is a larger issue in how the specs expect propertoes like size (with/height) to be always there v/s the fact that they also depend on the info.json/service, given the fact that the later can provide those/proportions. That already makes our dynamic IIIF manifest generation quite processing heavy and myself not happy.
Hi @DiegoPino,
Thanks for your detailed answer! Here is my "five pennies" on some statements.
I can not remember where in the specs (if you can point me to it it would be great), it says that URL arguments are not allowed
There is no strict prohibition, but it's clear from the context of Chapter 2 of IIIF Image API. Here are some examples:
The IIIF Image API can be called in two ways ... Both convey the request’s information in the path segments of the URI, rather than as query parameters.
... image’s base URI ... constructed according to the following URI Template:
{scheme}://{server}{/prefix}/{identifier}
The IIIF Image API URI for requesting an image MUST conform to the following URI Template:
{scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}
The URI for requesting image information MUST conform to the following URI Template:
{scheme}://{server}{/prefix}/{identifier}/info.json
.
So there is simply no place for GET parameters, and URI like http://{server}/iiif/2/{id}?page={n}
is treated by viewers (and Image API specs) as link to image with identifier {id}?page={n}
. So viewers according to Image API specs correctly try to retrieve information about the image with GET http://{server}/iiif/2/{id}?page={n}/info.json
request , and fails.
One problem was that a few versions ago (fixed now by @adolski!) GET arguments where not passed to the info.json, so page increments would not deliver a new size and actual image URL with every ?page argument change. That is fixed in the latest version in the 4.1.x series here.
Yes, I also noticed that the page
argument does not affect the content of info.json
, but that fix in 4.1.6
version, which isn't released yet.
An URL wrapper logic around this to make API client happy and of course also Cantaloupe ... Basically we (i) have planned for a few proxy endpoint/URLS that wrap cantaloupe ones and do exactly what @adolski suggests ... I know it sounds like a hack ...
I am also working in this direction now, trying to setup URL rewriting on Nginx that proxies requests to Cantaloupe. But you are right, this is a hack :)
Not sure if i explain myself, like a pre handler for the request.
I got the idea, thank you. It sounds cool, moreover - we already have something like that: ScriptLookupStrategy
for sources. For example, S3Source with ScriptLookupStrategy converts id
from URI to bucket and object key in S3 storage. If we could add information about page number in result of converter's method call, then issue would be resolved.
Hi @mitring, thanks. Yes its pretty much the same use case we have.
I feel this statement here, which is the one that really is complicating the issue, is wrong in terms of how and URI, arguments and protocol work (RFC specs):
So there is simply no place for GET parameters, and URI like
http://{server}/iiif/2/{id}?page={n}
is treated by viewers (and Image API specs) as link to image with identifier{id}?page={n}
. So viewers according to Image API specs correctly try to retrieve information about the image withGET http://{server}/iiif/2/{id}?page={n}/info.json
request , and fails.
My take is that ?Page
is a GET argument and can not/should never be made part of the ID. the ID is part of the path and it processed via a pattern. Even in cases where you have servers setup (like we do in PHP) to convert GET arguments into slash separated path segments/parts, that last everything after the ? should either processed differently or worst case, discarded. Webservers do that, NGNIX will do that, even JS would do that, why would a spec not do that? What i say is that viewers are getting this wrong or the SPEC is not explicit enough
I got the idea, thank you. It sounds cool, moreover - we already have something like that:
ScriptLookupStrategy
for sources. For example, S3Source with ScriptLookupStrategy convertsid
from URI to bucket and object key in S3 storage. If we could add information about page number in result of converter's method call, then issue would be resolved.
Yes. I agree. I wish there could be other options, but could be complex to enforce in simple code implementations (where calling a URI excels). Like, Instead of using get arguments we could use HTTP HEADERS but then there is no way you can pass headers from a Manifest! Another issues of just a URI based document which also means we need to be able to use GET. We have use headers many times when needing backend authentication to retrieve images from its source but never from inside a manifest of course.
My conclusion is: its a little bit complex to demand this change here and not even sure i could make a point (like asking please!) without also asking Client writers/viewer implementers and IIIF API specs committee to clarify what space GET has in their API.
Doing a little research into this particular issue. If I may, I would like to suggest following the kind of naming convention defined for URN: https://en.wikipedia.org/wiki/Uniform_Resource_Name
It is defined as urn:<NID>:<NSS>
. Not necessarily prefixing it with urn:
and in our case <NID>
. It doesn't necessarily make sense. But the <NSS>
is our identifier, with a sub-delims
separating the page number. Defined sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
. The two that make the most sense to me are ,
or ;
.
An identifier could be /iiif/2/item.pdf,3/full/full/0/default.jpg
or /iiif/2/item.pdf;3/full/full/0/default.jpg
Thoughts?
@mitring
And in such case
page
parameter breaks the compatibility with IIIF Image API specification.
My solution was to use Level0 support in viewer (Mirador3 in my case). Resource without service - does not break IIIF api compatibility.
{
"@id": "https:\/\/my.server\/ein3ft\/ri-138226\/canvas-st16410385-page-3",
"@type": "sc:Canvas",
"height": 1600,
"width": 1600,
"label": "compressed.tracemonkey-pldi-09.pdf",
"thumbnail": {
"@id": "https:\/\/my.server\/image416\/iiif\/2\/fgfg5h4f%2Fmain%2Fr%2Fop%2F71u%2Frop71u2n42gj.pdf\/full\/,100\/0\/default.jpg?page=3"
},
"images": [
{
"@type": "oa:Annotation",
"motivation": "sc:painting",
"on": "https:\/\/my.server\/ein3ft\/ri-138226\/canvas-st16410385-page-3",
"resource": {
"@id": "https:\/\/my.server\/image416\/iiif\/2\/fgfg5h4f%2Fmain%2Fr%2Fop%2F71u%2Frop71u2n42gj.pdf\/full\/full\/0\/default.jpg?page=3",
"@type": "dctypes:Image",
"format": "pdf",
"height": 1600,
"width": 1600
}
}
]
}
I'm tinkering with something for this in my fork that will allow a user to set a property called page_number.delimiter
in the cantaloupe.properties file. If they set this property, the PublicResource abstract class will look for the page number in the Identifier URI path component rather than from a query string parameter. The goal is to allow you to do something like this:
page_number.delimiter = ;p
And you would reference page numbers in a PDF via this format:
/iiif/2/filename.pdf;p12/full/full/0/default.jpg
If you leave the property in cantaloupe.properties blank, it works just as it does today with ?page=12
I still have some test failures to work through and I want to be able to test more different scenarios but I think it will work. I based most of the code on the ScaleConstraint code which works pretty much the same. It should work just fine with both settings.
@cmhdave, it looks like we started working on this around the same time. :smile:
The identifier path component needs to support three things, currently: an identifier, a page number, and/or a scale constraint, and the image server needs to be able to transform it not only from its component parts, but also to them (in order to support generating URIs).
Version 4.1 already supports a scale constraint suffixed to an identifier. I'm thinking that I will phase out the "suffix" terminology and replace it with the concept of a "meta-identifier" which consists of those components. So, it can be said that the "identifier path component" may contain either an identifier or a meta-identifier.
As for how a meta-identifier is formatted, there are two main options, configurable via a meta_identifier.transformer
key:
StandardMetaIdentifierTransformer
suffixes a page number and/or scale constraint to the identifier similar to how the scale constraint works now. This transformer supports a meta_identifier.transformer.StandardMetaIdentifierTransformer.delimiter
configuration key, with which the delimiter/separator is configurable. By default, the meta-identifier of page 3 of a PDF would look like: document.pdf;3
(props to @cmhdave for the idea of URN-compliant identifiers)
file.pdf;p3;s1:2
) in case any more components come up in the future that would introduce ambiguity. I haven't implemented that yet, though.DelegateMetaIdentifierTransformer
enables full control over the transformation via two new delegate methods: deserialize_meta_identifier(String)
and serialize_meta_identifier(Hash<String,Object>)
. There is also a new page_number
key in the delegate context to accompany the identifier
and scale_constraint
keys that were already there. This transformer is sort of based on @DiegoPino's idea above.That is the meat of it, I think. I'm open to feedback on this approach. I tried to come up with a solution that is simple out-of-the-box but offers precise control when needed.
I'm honestly happy that you are tackling this @adolski. Even though what I have is working I wasn't confident that I didn't break anything with the scale constraint. (The tests I wrote with the different variations passed but who knows what I might have broken in a live environment). That and I couldn't shake the feeling that the way I was doing it was kind of "hacky" and yours sounds like a more robust solution. I look forward to trying yours out!
The meta-identifier feature is on develop
now. I hope it works. 😰 Good luck!
No worries @cmhdave, I felt the same way as I was working on this. The "scale constraint suffix" stuff was hacky to begin with. Hopefully what's in place now is a little bit better. Also, in the end, it was a lot more work than I thought it would be.
This thread was originally talking about a page count in information responses. I don't want to add that by default, but there is now a page_count
key in the delegate context. You can implement extra_iiif_information_response_keys()
and do whatever you want with it.
I'm going to close this issue as I think it's done now, more or less, but feel free to reopen if you find otherwise.
Hello, team!
Please consider adding additional section into info.json with page count for multi-page formats such as TIF/PDF.
I believe it could be very useful in some scenarios.
Thank you!