IIIF-Commons / iiif-helpers

MIT License
3 stars 2 forks source link

Added transcription helpers for extracting text from a canvas #15

Closed stephenwf closed 3 months ago

stephenwf commented 4 months ago

Transcription helper.

Will find the following transcriptions:

Cookbook:

Plaintext rendering on canvas:

"rendering": [
  {
    "id": "https://fixtures.iiif.io/video/indiana/volleyball/volleyball.txt",
    "type": "Text",
    "label": {
      "en": [
        "Transcript"
      ]
    },
    "format": "text/plain"
  }
]

VTT annotation body on AV canvases:

"annotations": [
  {
    "id": "https://iiif.io/api/cookbook/recipe/0219-using-caption-file/canvas/page2",
    "type": "AnnotationPage",
    "items": [
      {
        "id": "https://iiif.io/api/cookbook/recipe/0219-using-caption-file/canvas/page2/a1",
        "type": "Annotation",
        "motivation": "supplementing",
        "body": {
          "id": "https://fixtures.iiif.io/video/indiana/lunchroom_manners/lunchroom_manners.vtt",
          "type": "Text",
          "format": "text/vtt",
          "label": {
            "en": [
              "Captions in WebVTT format"
            ]
          },
          "language": "en"
        },
        "target": "https://iiif.io/api/cookbook/recipe/0219-using-caption-file/canvas"
      }
    ]
  }
]

OCR annotations:

OR Linking Directly to an ALTO File. (FUTURE, NOT IMPLEMENTED)

"rendering": [
  {
    "id": "https://iiif.io/api/cookbook/recipe/0068-newspaper/newspaper_issue_1-alto_p2.xml",
    "type": "Text",
    "format": "application/xml",
    "profile": "http://www.loc.gov/standards/alto/",
    "label": {
      "en": [
        "ALTO XML"
      ]
    }
  }
],

It will produce a standard format for both temporal and plaintext/positional plaintext, including selectors.

interface Transcription {
  id: string;
  source: any;
  plaintext: string;
  segments: Array<{
    text: string;
    textRaw: string;
    granularity?: 'word' | 'line' | 'paragraph' | 'block' | 'page';
    language?: string;
    selector?: ParsedSelector;
    startRaw?: string;
    endRaw?: string;
  }>;
}

ParsedSelector include spatial and temporal information. Either from an annotation or from VTT (very simple parsing at the moment - external libraries for it are heavy). If there is just plaintext by itself, then there are no segments.

A viewer could start with just showing the plaintext, and then implement optional segments later.

Some new helpers too:

codesandbox-ci[bot] commented 4 months ago

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

stephenwf commented 4 months ago

At the moment, we are losing track of the Annotation target when parsing. It will very likely be the Canvas, but it could be

And clients might need to check when they are providing navigation using the selector that it's got the right target.

stephenwf commented 4 months ago

Also need to pass in a language, so that the transcription can check for choices structured like this: https://iiif.io/api/cookbook/recipe/0074-multiple-language-captions/

stephenwf commented 3 months ago

This still needs more testing, will leave open.