IIIF / api

Source for API and model specifications documents (api and model)
http://iiif.io/api
107 stars 54 forks source link

How to support Structural Navigation #2320

Open brittnylapierre opened 1 week ago

brittnylapierre commented 1 week ago

Edit: This issue was moved from the Cookbook Recipes repo. We are looking for technical specifications writers to help us determine the best solution in the IIIF context for detailed structural navigation. See the big comment I added for detailed context: https://github.com/IIIF/api/issues/2320#issuecomment-2457744843

Recipe Name

Structural navigation using Ranges: 2 Ways

Use case

You can enhance your manifest by adding Ranges that reference specific parts of canvases. By using the label field to describe these references with text, you can create structured navigation similar to a tagged PDF. Learn more about structural navigation here.

Way 1: Attaching Annotations to a Range via a supplementary AnnotationCollection Example manifest: https://upcdn.io/kW15cD4/raw/html-annots-3-anns-in-range1.json

Pros

Cons

Example of what the viewers should support - displaying annotations with the range titles: image

Way 2: Using ranges with label fields, referencing positional canvas URLs Example manifest: https://upcdn.io/kW15cD4/raw/partial-ranges-nav.json Pros

Cons

Theseus screenshot: image

brittnylapierre commented 1 week ago

Related: https://github.com/IIIF/cookbook-recipes/issues/28

kirschbombe commented 1 week ago

Hi, @brittnylapierre - Thanks for creating the new issue. This is useful for documenting some of the use cases that we should be thinking about for Presi 4 and accessibility. It might be helpful if you could begin the use case with a bit more about what you are trying to achieve and why, more of the problem you are trying to solve rather that the proposed solution. For example, is this to assist screen readers? How is this meant to enhance accessibility?

brittnylapierre commented 1 week ago

Thank you! I will edit it - but for a short answer it is for screen reader accessibility, yes :)

glenrobson commented 1 week ago

Questions:

brittnylapierre commented 1 week ago

Desired outcome of having this data in a manifest

Enable IIIF viewer developers to provide a universal experience of exploring IIIF content, regardless of a user's mode of access. (Ideally, having content rendered as hierarchical, accessible HTML with text-based data.)

All of this background info results in people using screen readers having to download a tagged PDF to access the content in manifests for library and archive print materials, instead of being able to use the content from object pages on access websites directly in IIIF viewers, like other users.

What is the ideal way for IIIF to support very detailed, hierarchical, screen reader friendly representations of IIIF images within manifests?

What functionality does tagged PDFs give with a screen reader?

About screen readers:

  1. Screen reader users navigate content linearly, moving from one element to the next using arrow keys or the tab key.
  2. HTML is the most compatible format for screen reader technology. Lots of screen reader tech are able to effectively scan HTML - allowing users various hot-keys and helpful tools for navigating web pages more efficiently.

Through IIIF, some adopters currently use ranges for table of contents ability, and annotations created from OCR and AI solutions for displaying content of their IIIF images in screen readable text.

Some adopters also use the rendering field to provide alternative representations of their content, for example, tagged PDFs. Tagged PDFs are very labor intensive to produce, and still not as ideal as HTML for screen reader technology.

With all of their faults, tagged PDFs, similar to current annotation based OCR-correction workflows, offer digitization librarians GUI based technologies to do their tagging and text correction, and that is why they are commonly used.

Tagged PDFs are also used for accessible representations, because in theory, they provide the following:

  1. Logical Reading Order: Tagged PDFs establish a clear structure, enabling screen readers to read content in the intended order; poorly tagged documents may lead to confusion or inaccessibility.
  2. Efficient Navigation: Tags make it possible for users to quickly jump between headings and sections, similar to skimming a document visually, with their screen reading software.
  3. Element Identification: Tags indicate the type of content (paragraphs, lists, tables), providing context for screen reader users.
  4. Accessible Tables: Proper tagging helps screen readers interpret complex tables by identifying rows, columns, and headers.
  5. Alternative Text for Images: Figure tags include alt text, ensuring that non-text elements are described for users relying on assistive technologies.

An interesting thing to note is that tags in PDFs do not alter the visual appearance but augment a structure in PDFs to make them more compatible with assistive technologies.

Essentially, tagged PDFs are PDFs which are marked up with HTML-like tags for screen reader compatibility.

More on tagged PDFs. More on HTML vs Tagged PDFs

To read more about how AI is being used to create document structures in scanned images and PDFs, see: Azure Doc Layout Intelligence Most AI services are similar to this.

What would happen if there are two table of contents one for chapters and one for accessibility?

For this question, I think we would need a way for tools developers to know which one was which, to handle them differently in their GUI. I think someone during the meeting we had discussed the possibility of adding the 'provides' property to ranges in V4, since the accessibilityFeatures property on schema.org supports both table of contents and structured navigation.

Other thoughts/considerations A non-native IIIF solution would be to recommend people to use HTML files in the rendering field to support accessibility, and tools developers to ensure that they display this HTML in the browser for end-users, but this does not support our communities current OCR correction workflows, or the idea that they can use the content search api, and is not a IIIF-native solution.

Some things IIIF API creators could consider when thinking about how to support the functionality tagged PDFs provide to end users, natively in IIIF are…

  1. How can we implement a hierarchical document structure, with pagination, to support better accessibility of the pages displaying IIIF content?
  2. In addition to the higher level hierarchical document structure, how can we add detailed hierarchical screen-reader friendly markup to different parts of the document, with pagination?
  3. What is the best way to provide detailed descriptions of complex visual elements, ensuring that screen reader users receive equivalent information to sighted users, but also including this in the hierarchical document structure?

Other pain-points for current range and annotation based solutions include the following:

  1. Manifest with a long list of ranges can be cumbersome to navigate with screen readers and keyboard tabbing, requiring excessive keystrokes and potentially causing fatigue or frustration if these ranges are not paginated.
  2. Manifests with OCR annotations don’t provide a hierarchical display of content, which can be challenging for screen reader users. Without proper structuring, this can lead to excessive keystrokes for navigation, similar to websites with many unlabeled links. Users also don’t get the same contextual flow as they would with more structured documents, which makes it hard to understand their current location within the document itself.

More IIIF Considerations I can think of/were discussed when we met:

  1. The spec currently restricts the HTML that can be used in IIIF - to augment IIIF image content with highly structured HTML natively in IIIF, this would need some revision.
  2. Annotations are already the basis of content search API - would make it easy for users to search the accessibility markup of our documents.
  3. Annotations are already at the core of OCR (Optical Character Recognition) creation and correction workflows in many institutions, and tools like Madoc.
  4. Any structured markup developed through layout analysis AI will need to be human corrected through tooling, see above point that tools like Madoc work on the annotation level for this.
  5. In v3 - By using AnnotationCollections in conjunction with Ranges, it's possible to create more manageable and context-rich document structures. Any content that would map to an HTML header would be a range's label, and content within the corresponding section for the header would be found under an AnnotationCollection. This is a good approach for long books, etc. Annotations themselves can have HTML as content which will work great for viewers to render accessible HTML for an element within an image, for example, a table. (Another side note is we will also still need to develop solutions for range label correction that would work within annotation-based OCR correction tools. We would also want some cookbook or 'rule' to say clients should render the range label as a HTML header at the appropriate level: ex h1, h2, h3....)
  6. But for v4 - should we include structuralNavigation as an option for ‘provides’ on annotations, if annotations are to be used in Ranges this way, to allow tools developers to handle these annotations appropriately?
  7. After looking here, it doesn't look like the web annotation model supports any hierarchy type structures for annotations, but could annotations, AnnotationPages, or AnnotationCollections be extended to provide hierarchical functionality, too? Is it wise to keep ranges as a functionality for higher level table of contents structures, and instead support and promote some annotation native hierarchy for detailed document markup? Or does that muddy the waters and limit the use case of the existing range object too much?
  8. Should ranges and range labels (no annotations) be used and promoted for detailed structural navigation even if current digitization and OCR workflows tend to use annotations currently? How could ranges handle complex elements like tables?

Related cookbooks

Thank you! I hope this helps provide more background and context (and not too much all at once!) If the spec writers want to have a Q/A with people who are deep into the accessibility space for print materials let me know and I can also arrange one to support these efforts.