Fragment identifier API for ALTO

jpmoreux commented 9 years ago

The ALTO Fragment Identifier API is a proposal for a web service that, in response to a standard HTTP or HTTPS request:

references arbitrary content within an ALTO file through the use of fragment identifiers (referencing),
returns the XML contents referenced by such identifiers (dereferencing).

This service aims to facilitate reuse of ALTO resources in digital librairies (bookmarks, annotations...). It could be used to embody the concept of hyperlinking within ALTO documents, and to access to the content itself.

The URI could specify any portion of ALTO file (paragraph, string, illustration...) referenced by various mechanisms (ID, spatial offset, order...), range of contents (paragraphs 2 to 5), etc.

Note : the ALTO schema is not impacted. The whole idea is to edit a specification to be implemented by digital libraries (if they are willing to).

Use cases

See: http://prezi.com/6fvgzri_z3b3/?utm_campaign=share&utm_medium=copy

a. A digital library user wants to reference a specific marginalia on a specific page of a digital document, given its spatial position: -> http://gallica.bnf.fr/ark:/12148/bpt6k96006893/f20.alto/id/@89:485 RETURNS a list of block IDs : ("PAG_00000020_TB000010")

-> http://gallica.bnf.fr/ark:/12148/bpt6k96006893/f20.alto/xml/TextBlock[ID=PAG_00000020_TB000010] RETURNS: the TextBlock XML element <TextBlock ID="PAG_00000020_TB000010" WIDTH="1386" HEIGHT="287" VPOS="1090" HPOS="1303" STYLEREFS="TXT_18" LANG="fr" <TextLine ID="PAG_00000020_TL000016" WIDTH="1383" HEIGHT="63" VPOS="1090" HPOS="1304" STYLEREFS="TXT_18" <String ID="PAG_00000020_ST000071" ...

b. An application wants to list all the images on a specific page of a digital document: -> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/id/Illustration RETURNS a list of block IDs: ("PAG_00000026_IL000001")

-> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/xml/Illustration[ID=PAG_00000026_IL000001] RETURNS the XML element: <Illustration ID="PAG_00000026_IL000001" HPOS="744" VPOS="707" HEIGHT="3410" WIDTH="819"/

From this XML content, the application can then extract the illustration using IIIF: -> http://gallica.bnf.fr/iiif/ark:/12148/bpt6k96128443/f26/744,707,819,3569/full/0/native.jpg

c. An application wants to extract all the text within the print space of a specific page: -> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/id/PrintSpace/*[@CONTENT] RETURNS a list of block IDs: ("PAG_00000026_TB000002","PAG_00000026_TB000003","PAG_00000026_TB000004"...)

From this IDs, the application can then extract the XML elements and filter the text blocks to access the text itself.

Inspiration

IIIF Image API (http://iiif.io/api/image/2.0) specifies a web service that returns an image. The HTTP request can specify the region, size, rotation, quality characteristics and format of the requested image -> http://gallica.bnf.fr/iiif/ark:/12148/bpt6k65372641/f1/1165.4351015801358,833.7189616252821,969.8363431151238,964.1647855530472/171,170/0/native.jpg

EPUB format as a recommended specification on Fragment Identifiers ( http://www.idpf.org/epub/linking/cfi/epub-cfi.html) that helps to express paths to specific locations within the content: -> book.epub#epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10)

Actions

Use cases survey
Contact with IIIF ?
Syntax specs

cneud commented 8 years ago

In IIIF Presentation API, segments of XML files may be extracted with URL-embedded XPath expressions. See http://iiif.io/api/presentation/2.1/#segments

altomator commented 8 years ago

IIIF annotations and ALTO.pdf

altomator commented 7 years ago

IIIF Newspaper Implementation Notes: http://bit.ly/2a63PR6

IIIF Issues: https://github.com/IIIF/iiif-stories/issues See #77, #78, #79, #80

altomator commented 7 years ago

Draft: https://docs.google.com/document/d/1Jn-iwJpGI6SRt1a6s8aZFEZvLAxCZ17zRlVkLIvtevk/edit?usp=sharing

cowboyMontana commented 7 years ago

Issue renamed and repurposed. Closed.

altoxml / schema