Closed jpmoreux closed 7 years ago
In IIIF Presentation API, segments of XML files may be extracted with URL-embedded XPath expressions. See http://iiif.io/api/presentation/2.1/#segments
IIIF Newspaper Implementation Notes: http://bit.ly/2a63PR6
IIIF Issues: https://github.com/IIIF/iiif-stories/issues See #77, #78, #79, #80
Issue renamed and repurposed. Closed.
The ALTO Fragment Identifier API is a proposal for a web service that, in response to a standard HTTP or HTTPS request:
This service aims to facilitate reuse of ALTO resources in digital librairies (bookmarks, annotations...). It could be used to embody the concept of hyperlinking within ALTO documents, and to access to the content itself.
The URI could specify any portion of ALTO file (paragraph, string, illustration...) referenced by various mechanisms (ID, spatial offset, order...), range of contents (paragraphs 2 to 5), etc.
Note : the ALTO schema is not impacted. The whole idea is to edit a specification to be implemented by digital libraries (if they are willing to).
Use cases
See: http://prezi.com/6fvgzri_z3b3/?utm_campaign=share&utm_medium=copy
a. A digital library user wants to reference a specific marginalia on a specific page of a digital document, given its spatial position: -> http://gallica.bnf.fr/ark:/12148/bpt6k96006893/f20.alto/id/@89:485 RETURNS a list of block IDs : ("PAG_00000020_TB000010")
-> http://gallica.bnf.fr/ark:/12148/bpt6k96006893/f20.alto/xml/TextBlock[ID=PAG_00000020_TB000010] RETURNS: the TextBlock XML element <TextBlock ID="PAG_00000020_TB000010" WIDTH="1386" HEIGHT="287" VPOS="1090" HPOS="1303" STYLEREFS="TXT_18" LANG="fr" <TextLine ID="PAG_00000020_TL000016" WIDTH="1383" HEIGHT="63" VPOS="1090" HPOS="1304" STYLEREFS="TXT_18" <String ID="PAG_00000020_ST000071" ...
b. An application wants to list all the images on a specific page of a digital document: -> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/id/Illustration RETURNS a list of block IDs: ("PAG_00000026_IL000001")
-> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/xml/Illustration[ID=PAG_00000026_IL000001] RETURNS the XML element: <Illustration ID="PAG_00000026_IL000001" HPOS="744" VPOS="707" HEIGHT="3410" WIDTH="819"/
From this XML content, the application can then extract the illustration using IIIF: -> http://gallica.bnf.fr/iiif/ark:/12148/bpt6k96128443/f26/744,707,819,3569/full/0/native.jpg
c. An application wants to extract all the text within the print space of a specific page: -> http://gallica.bnf.fr/ark:/12148/bpt6k96128443/f26.alto/id/PrintSpace/*[@CONTENT] RETURNS a list of block IDs: ("PAG_00000026_TB000002","PAG_00000026_TB000003","PAG_00000026_TB000004"...)
From this IDs, the application can then extract the XML elements and filter the text blocks to access the text itself.
Inspiration
IIIF Image API (http://iiif.io/api/image/2.0) specifies a web service that returns an image. The HTTP request can specify the region, size, rotation, quality characteristics and format of the requested image -> http://gallica.bnf.fr/iiif/ark:/12148/bpt6k65372641/f1/1165.4351015801358,833.7189616252821,969.8363431151238,964.1647855530472/171,170/0/native.jpg
EPUB format as a recommended specification on Fragment Identifiers ( http://www.idpf.org/epub/linking/cfi/epub-cfi.html) that helps to express paths to specific locations within the content: -> book.epub#epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:10)
Related work: http://pro.europeana.eu/blogpost/europeana-aligns-with-the-international-image-interoperability-framework-iiif http://pro.europeana.eu/files/Europeana_Professional/Projects/Project_list/Europeana_Cloud/Deliverables/D4.4%20Recommendations%20For%20Enhancing%20EDM%20to%20Support%20Research%20Oriented%20Content.pdf
Actions