altoxml / schema

ALTO XML schema - latest and all former versions
51 stars 4 forks source link

ALTO & IIIF integration #45

Open altomator opened 7 years ago

altomator commented 7 years ago

This issue is a replacement for issue #33 ("Fragment identifier API for ALTO") -> A decision was made to leverage the IIIF Presentation API.

Draft: https://docs.google.com/document/d/1Jn-iwJpGI6SRt1a6s8aZFEZvLAxCZ17zRlVkLIvtevk/edit?usp=sharing

IIIF Newspaper Implementation Notes: http://bit.ly/2a63PR6 IIIF Slack channel: https://iiif.slack.com/messages/C1483BWMT

IIIF Issues: https://github.com/IIIF/iiif-stories/issues See #77, #78, #79, #80

IIIF Working Group on text granularity:

IIIF/text granularity Slack channel: https://iiif.slack.com/messages/C5R68LH51

Vatican Conference Presentations:

acpopat commented 7 years ago

Hi, seems like a good way to learn generally about both ALTO and IIIF, so assigning this to myself. If someone wants to mentor me around this, that would be wonderful.

altomator commented 7 years ago

Hi Ashok,

Clemens and I have followed this topic during the last few months. It started with an idea of mine (see #33) but we quickly moved to IIIF Presentation API. There is a draft here which summarizes our use cases and an on-going working group on text granularity, which is one of our hot issues right now.

altomator commented 7 years ago

Issue has been updated with new stuff from IIIF Granularity workgroup (august and september meeting notes) and our draft too: https://docs.google.com/document/d/1Jn-iwJpGI6SRt1a6s8aZFEZvLAxCZ17zRlVkLIvtevk/edit?usp=sharing

altomator commented 6 years ago

Sample of IIIF annotations at word, line and page levels (NCSU library):

Document: https://d.lib.ncsu.edu/collections/catalog/ua100_014-002-bx0028-010-000#?c=0&m=0&s=0&cv=0&z=-2215.0216%2C-251.5556%2C7978.0432%2C5031.1111

Manifest: https://d.lib.ncsu.edu/collections/catalog/ua100_014-002-bx0028-010-000/manifest "otherContent": [ { "@id": "https://ocr.lib.ncsu.edu/ocr/ua/ua100_014-002-bx0028-010-000_0001/ua100_014-002-bx0028-010-000_0001-annotation-list-word.json", "@type": "sc:AnnotationList", "label": "Text of this page (word level)" }, { "@id": "https://ocr.lib.ncsu.edu/ocr/ua/ua100_014-002-bx0028-010-000_0001/ua100_014-002-bx0028-010-000_0001-annotation-list-line.json", "@type": "sc:AnnotationList", "label": "Text of this page (line level)" }, { "@id": "https://ocr.lib.ncsu.edu/ocr/ua/ua100_014-002-bx0028-010-000_0001/ua100_014-002-bx0028-010-000_0001-annotation-list-paragraph.json", "@type": "sc:AnnotationList", "label": "Text of this page (paragraph level)" }

Word level annotations list: https://ocr.lib.ncsu.edu/ocr/ua/ua100_014-002-bx0028-010-000_0001/ua100_014-002-bx0028-010-000_0001-annotation-list-word.json

{ "@context": "http://iiif.io/api/presentation/2/context.json", "@id": "https://ocr.lib.ncsu.edu/ocr/ua/ua100_014-002-bx0028-010-000_0001/ua100_014-002-bx0028-010-000_0001-annotation-list-word.json", "@type": "sc:AnnotationList", "@label": "OCR text granularity of word", "resources": [ { "@id": "https://ocr.lib.ncsu.edu/ocr/ua/ua100_014-002-bx0028-010-000_0001/ua100_014-002-bx0028-010-000_0001-annotation-list-word/471,303,162,156", "@type": "oa:Annotation", "motivation": "sc:painting", "resource": { "@type": "cnt:ContentAsText", "format": "text/plain", "chars": "~A" }, "on": "https://d.lib.ncsu.edu/collections/canvas/ua100_014-002-bx0028-010-000_0001#xywh=471,303,162,156" }, { "@id": "https://ocr.lib.ncsu.edu/ocr/ua/ua100_014-002-bx0028-010-000_0001/ua100_014-002-bx0028-010-000_0001-annotation-list-word/714,308,709,153", "@type": "oa:Annotation", "motivation": "sc:painting", "resource": { "@type": "cnt:ContentAsText", "format": "text/plain", "chars": "Mechanical" }, "on": "https://d.lib.ncsu.edu/collections/canvas/ua100_014-002-bx0028-010-000_0001#xywh=714,308,709,153" }, ...

artunit commented 4 years ago

As per the 2019-12-13 ALTO Board Meeting, I am linking the IIIF Text Granularity Extension and the associated github discussion to this issue.

artunit commented 3 years ago

As per the 2021-04-29 Board Meeting and feedback from the IIIF Text Granularity group, I am removing the _high priority label from this issue. The issue will be kept open, but given the Text Granularity Extension, new activity on this should result in a new issue.