altoxml / schema

ALTO XML schema - latest and all former versions
51 stars 4 forks source link

Fragment identifier API for ALTO #33

Closed jpmoreux closed 7 years ago

jpmoreux commented 9 years ago

The ALTO Fragment Identifier API is a proposal for a web service that, in response to a standard HTTP or HTTPS request:

This service aims to facilitate reuse of ALTO resources in digital librairies (bookmarks, annotations...). It could be used to embody the concept of hyperlinking within ALTO documents, and to access to the content itself.

The URI could specify any portion of ALTO file (paragraph, string, illustration...) referenced by various mechanisms (ID, spatial offset, order...), range of contents (paragraphs 2 to 5), etc.

Note : the ALTO schema is not impacted. The whole idea is to edit a specification to be implemented by digital libraries (if they are willing to).

Use cases

See: http://prezi.com/6fvgzri_z3b3/?utm_campaign=share&utm_medium=copy

a. A digital library user wants to reference a specific marginalia on a specific page of a digital document, given its spatial position: -> http://gallica.bnf.fr/ark:/12148/bpt6k96006893/f20.alto/id/@89:485 RETURNS a list of block IDs : ("PAG_00000020_TB000010")

-> http://gallica.bnf.fr/ark:/12148/bpt6k96006893/f20.alto/xml/TextBlock[ID=PAG_00000020_TB000010] RETURNS: the TextBlock XML element <TextBlock ID="PAG_00000020_TB000010" WIDTH="1386" HEIGHT="287" VPOS="1090" HPOS="1303" STYLEREFS="TXT_18" LANG="fr" <TextLine ID="PAG_00000020_TL000016" WIDTH="1383" HEIGHT="63" VPOS="1090" HPOS="1304" STYLEREFS="TXT_18" <String ID="PAG_00000020_ST000071" ...

b. An application wants to list all the images on a specific page of a digital document: -> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/id/Illustration RETURNS a list of block IDs: ("PAG_00000026_IL000001")

-> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/xml/Illustration[ID=PAG_00000026_IL000001] RETURNS the XML element: <Illustration ID="PAG_00000026_IL000001" HPOS="744" VPOS="707" HEIGHT="3410" WIDTH="819"/

From this XML content, the application can then extract the illustration using IIIF: -> http://gallica.bnf.fr/iiif/ark:/12148/bpt6k96128443/f26/744,707,819,3569/full/0/native.jpg

c. An application wants to extract all the text within the print space of a specific page: -> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/id/PrintSpace/*[@CONTENT] RETURNS a list of block IDs: ("PAG_00000026_TB000002","PAG_00000026_TB000003","PAG_00000026_TB000004"...)

From this IDs, the application can then extract the XML elements and filter the text blocks to access the text itself.

Inspiration

IIIF Image API (http://iiif.io/api/image/2.0) specifies a web service that returns an image. The HTTP request can specify the region, size, rotation, quality characteristics and format of the requested image -> http://gallica.bnf.fr/iiif/ark:/12148/bpt6k65372641/f1/1165.4351015801358,833.7189616252821,969.8363431151238,964.1647855530472/171,170/0/native.jpg

EPUB format as a recommended specification on Fragment Identifiers ( http://www.idpf.org/epub/linking/cfi/epub-cfi.html) that helps to express paths to specific locations within the content: -> book.epub#epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10)

Related work: http://pro.europeana.eu/blogpost/europeana-aligns-with-the-international-image-interoperability-framework-iiif http://pro.europeana.eu/files/Europeana_Professional/Projects/Project_list/Europeana_Cloud/Deliverables/D4.4%20Recommendations%20For%20Enhancing%20EDM%20to%20Support%20Research%20Oriented%20Content.pdf

Actions

  1. Use cases survey
  2. Contact with IIIF ?
  3. Syntax specs
cneud commented 8 years ago

In IIIF Presentation API, segments of XML files may be extracted with URL-embedded XPath expressions. See http://iiif.io/api/presentation/2.1/#segments

altomator commented 8 years ago

IIIF annotations and ALTO.pdf

altomator commented 7 years ago

IIIF Newspaper Implementation Notes: http://bit.ly/2a63PR6

IIIF Issues: https://github.com/IIIF/iiif-stories/issues See #77, #78, #79, #80

altomator commented 7 years ago

Draft: https://docs.google.com/document/d/1Jn-iwJpGI6SRt1a6s8aZFEZvLAxCZ17zRlVkLIvtevk/edit?usp=sharing

cowboyMontana commented 7 years ago

Issue renamed and repurposed. Closed.