Implementing "paged" Collections in Presentation API 3.0 for a IIIF fluent search response

mathewjordan commented 1 year ago

Problem

In https://iiif.io/api/presentation/3.0/#51-collection, one usage of a Collection could be to allow clients to:

Receive a set of Manifests, such as search results, for rendering.

Organization with a large number of Manifests and Collections representing objects is seeking to deliver responses as a Presentation API 3.0 Collection that is a transformation of request queries on their search index. The responses for these Collections could range widely in number of items total items and thus a flat Collection is not feasible. To minimize payload the organization wants to find a scalable and viewer agnostic method for implementation of some "paged" method. With the paging model being deprecated in 3.0, how would one implement an API response that viewers could digest? How can behavior be used to indicate to a viewer the organization of the paging?

Potential valid methods

Two methods I have thought of to solve this that are still spec:

A) Last Item Entry is Next Page

Response is Collection with a number of items, and the last item is the "Next Page." Top-level response acts as first page.

Pros

Scalable for large responses
Follows a scrolling model

Cons

Viewer may need to assume last item is not a part of a "Page" and rather some indicator to request an id and render additional item entries
No easy way to indicate number of pages
Previous page is perhaps unknown

B) Top Collection as Page Index

Response is a Collection where top Collection acts as an index of pages. Each of the top Collection items are a Collection as well.

Pros

Page collections contain items to be rendered to client
Hierarchical structure allows for partOf property to reference top-level Collection for response
Number of pages available on item length

Cons

Viewer may need to assume that the top-level Collectionitems are not rendered to the screen
Scalability at large number of pages becomes an issue

glenrobson commented 1 year ago

Related discussion on slack: https://iiif.slack.com/archives/C0AL6BU76/p1668604495246499

azaroth42 commented 1 year ago

The intent for deprecating paging was that collections, as part of the presentation API, are not semantic constructions and only exist to facilitate the construction of a user interface. As such, if there's "too many" items in a collection, then simply split up the collection into sections however the publisher desires. For example a collection of 5000 manifests could be split into 50 "pages" of 100 manifests, and give them labels to let the user see know what's in each.

But I agree that this doesn't help for Very Extensive search results... which is explicitly in scope for use of Collections where, for example, a result of 10 million manifests split into pages of 100 manifests would then have 100,000 sub-collections and a viewer might think it should render ALL of them in a list. That said, UIs could be constructed with different nestings at the same time and different numbers of hits in a page. For example the first page has 10 items for speed, the second through 10th have 25 items, and then there's an 11th collection which has further sub-collections of pages of 100 items each. The advantage of the current spec is that implementations are not forced into any particular model for results paging.

So ... my suggestion is that rather than trying to standardize a model, instead it would be more beneficial to enumerate difficulties with the specification for building UIs for search results. At which point we can look at why those difficulties exist, and how we might be able to solve them.

For example, could a no-nav like behavior for Collections provide some mitigation to the current issue?

azaroth42 commented 5 months ago

Propose removing the mention of "search results" from the collection use cases to avoid this confusion about paging. There isn't a search API that could return manifests, and if there was, it would probably look like the change discovery API format which does have paging.

azaroth42 commented 5 months ago

Should editorially call out the AS pattern for search results as per change discovery API for folks to consider for their own search results APIs (such as e.g. https://linked.art/api/1.0/ecosystem/search/ )

mathewjordan commented 5 months ago

The use case of search results is one that is valuable to the IIIF tenets of reuse and remixing. Its inclusion by the editors in both the current and previous specs is one of foresight on the editors' side that highlights the malleability of IIIF APIs. Search results as Collections are currently in use by Northwestern through its API, using the ?as=iiif URL parameter construct: Northwestern API. Removing the language regarding search results does not erase the actual usability concern with large collections. While large collections (whatever number this may be) and search results are different use cases, the Client-side presentation of the Collection resource would be greatly aided in the same way by some mechanism that performantly and programmatically guides it.

Take the David Rumsey Map Collection provided by the Internet Archive: David Rumsey Map Collection. This collection of over one hundred thousand items is provided (as a IIIF Collection) using the prescribed (A) method above. This is not a search result, but a named collection. There is no uniform way for clients to handle a Collection of this size. While it does “page” every 1000 items through the use of a stand-in Collection as the next page for the final item, the structure exploits this Collection resource as a page for this purpose.

Problems with the (A) method:

Client confusion: There is not an easy way for a Client to know that this is the continuation of a flat collection of ~111,792 items, and that this next Collection simply represents the next 1000 items in that Collection. The pattern above works only if the Clients enter the scenario with a bespoke way of handling it. It is not very portable to other IIIF Clients and Viewers in this way.
Total Items obfuscated: There is not an easy way for the Client to gauge the total number of items in this Collection, as it would need to traverse the pattern (of having a Collection as the last item) to retrieve an item count. This likely needs to be a number and a string as it should be easily used by the Client without concern for an internationalized string as a label.
Collection purpose blurred: There is not an easy way for the Client to differentiate actual Collections that may be named sub-Collections (such as volumes) of this Collection in contrast to ones being utilized for paging. Using Collections simply for paging without supporting properties to aid its purpose seems to abuse the specification.

As Robert Sanderson noted, Activity Streams could be used to support this. However, there is not a friendly IIIF mechanism for doing so within the Presentation API or a supporting extension or service. I understand that institutions could build their own model in parallel for handling this paging challenge. This does not make these Collections useful beyond their intended viewers and Clients. The consequence of this is that Collections will be further complicated to the point that they essentially become less likely to be adopted.

Collections have tremendous value, but it seems their value can only be unlocked if a bespoke Client is tailored to its patterns. It’s clear that at some higher item count, the Presentation API cannot faithfully render Collections to the screen. What purpose do they serve if Collections are limited in this way? This is a human usability concern and one that I argue we should not take lightly if we wish for Collections to be of use to the Client as well as the person viewing the items and not simply be dusty organizational structures existing only as JSON in our institutional APIs.

tristanr-cogapp commented 5 months ago

We discussed the problem of Collections that contain a large number of Manifests in the birds-of-a-feather session at the IIIF conference last week, and I agree with @mathewjordan's points raised above that the current Presentation 3.0 way of handling them is not ideal.

There will always be a legitimate reason to create Collections with 1000s of manifests (e.g. David Rumsey example above), and legitimate client applications that could consume that amount of them (e.g. Canopy IIIF). So it seems a shame that the only way to do so is in a decidedly non-semantic way such as creating a "next set of results" Collection called something like "result set 2 of 902".

I appreciate that there is an alternative answer that says Collections should only be ever used for a limited number of Manifests, and if you wanted to use them for arbitrary purposes such as search results then you can't. However, that feels like passing the buck and risking that each institution would come up with their own format to list 1000s of Manifests instead.

Wouldn't a much simpler and better idea be to allow paging once again? E.g. by optionally allowing the sort of results-oriented parameters that exist in the Linked Art Search Response mentioned by @azaroth42?

azaroth42 commented 5 months ago

This would be a significant breaking change for 3.0 to 4.0 ... @glenrobson @caitlinPerry can we commandeer a community call or two to discuss?

caitlinPerry commented 5 months ago

I'd be happy to do that. We're skipping June, but July is open.

azaroth42 commented 2 weeks ago

Notes doc: https://docs.google.com/document/d/1KhKn8UGVt__T5UExHVCEkpHzEA5ejs0GjmxKctFSGYw/edit?tab=t.0

IIIF / api