IIIF / iiif-stories

Community repository for documenting stories and use cases related to uses of the International Image Interoperability Framework.
21 stars 0 forks source link

I would like to get access to a raw OCR fragment #80

Open altomator opened 7 years ago

altomator commented 7 years ago

Description

Some use cases need to get access to information stored in the OCR format:

For these use cases, getting access to the raw OCR objects (or reference to the...) from the IIIF annotation layer would be usefull.

benwbrum commented 7 years ago

From the perspective of an OCR correction platform, I (the correction tool) would like to

tomcrane commented 7 years ago

So far, people have been using seeAlso to link from canvas to ALTO:

"seeAlso": {
            "@id": "http://wellcomelibrary.org/service/alto/b22014068/0?image=11",
            "format": "text/xml",
            "profile": "http://www.loc.gov/standards/alto/v3/alto.xsd",
            "label": "METS-ALTO XML"
          }

The Newspaper working group have some guidelines around this - https://www.slideshare.net/kestlund/newspapers-iiif-and-alto

This could also be modelled as a service.

altomator commented 7 years ago

My concern is that accessing the right element in the OCR file from the text annotation is not an straightforward process (using the geometrical information?)

 {
                    "@id":"http://dams.llgc.org.uk/iiif/3320863/annotation/5014243419640",
                        "@type":"oa:Annotation",
                        "motivation":"sc:painting",
                        "resource": 
                        {
                            "@type":"cnt:ContentAsText",
                            "format":"text/plain",                           
                            "chars":"NEWS."
                        },
                        "on":"http://dams.llgc.org.uk/iiif/3320860/canvas/3320863#xywh=5014,2434,196,40"
                    },

I suppose that for this specific use case (getting access to the XML stuff), we need another annotations list to reference XML external segments (http://iiif.io/api/presentation/2.1/#segments):

{
  "@context": "http://iiif.io/api/presentation/2/context.json",
  "@id": "http://example.org/iiif/book1/annotation/anno1",
  "@type": "oa:Annotation",
  "motivation": "sc:painting",
  "resource":{
    "@id": "http://example.org/iiif/book1/res/alto.xml#xpointer(//String[@id='Str_001'])",
    "@type": "dctypes:Text",
    "format": "application/alto+xml"
  },
  "on": "http://example.org/iiif/book1/canvas/p1#xywh=100,100,500,300"
}