IIIF / api

Source for API and model specifications documents (api and model)
http://iiif.io/api
107 stars 54 forks source link

Add `transcribing` motivation in addition to `painting` #1258

Closed tomcrane closed 6 years ago

tomcrane commented 7 years ago

From AV Working Group:

Although a client might render captions (e.g., on a video) in the same way as other painting motivation annotations, a specific captioning motivation would allow the client to present UI to the user to choose captions, e.g., for compliance with accessibility requirements, and for a better user experience. This is a common feature of video players that a user would expect to see.

tomcrane commented 7 years ago

This could also be used by a viewer that is capable of rendering WebVTT to tell that some of the content annotations are captions, and it should not render the WebVTT annotations (see https://raw.githubusercontent.com/IIIF/iiif-av/master/source/api/av/examples/11a.json).

jeffreycwitt commented 7 years ago

This seems to be a good example of a larger issue where clients should be expected to display different "kinds" of texts in different way. A generic "painting" motivation doesn't seem to be enough to alert a client to these difference.

To the "captioning" motivations, others like "normalized-transcription", "diplomatic-transcription", and "translation" could be added. I suspect there are others. A client might want to be able to create different views for these different "kinds" of texts.

In the past I've been handling this with Layers, but it would be best to handle this in a way that is consistent with what the A/V groups is doing.

In short these seems like an issue bigger than A/V. Is it possible to add a more general github label, so that this issue can be discussed as part of a larger discussion of IIIF supported motivations?

zimeon commented 7 years ago

Note relation to #511

zimeon commented 7 years ago

Sense in Toronto discussion is that we want to allow refining motivations without adding an ever increasing number of motivations to the Presentation API specification. Thus, we might add a captioning motivation in another namespace (community controlled list, say iiifc: prefix for illustration, the prefix would be defined in presentation context) but this would be in addition to the painting motivation (that is in the spec), and would thus also support a local addition ext:special-ing in a set of motivations:

"motivation": ["painting", "iiifc:captioning", "ext:special-ing"]

Would need to understand process for consideration of how to add other motivations to the community namespace.

tomcrane commented 7 years ago

The more I think about this the less sure I am that captioning - the suggestion example that leads to these issues - is a subclass of painting. It doesn't have to be, in the list suggestion above. Many clients will render such annotations on top of the image or video, and often the annotation will target fragments of the time dimension only. But the intent of captioning can be met by other means that would still satisfy the publisher of those captions.

Here's what the spec says about painting:

Note that all resources which are to be displayed as part of the representation are given the motivation of “sc:painting”, regardless of whether they are images or not. For example, a transcription of the text in a page is considered “painting” as it is a representation of the object, whereas a comment about the page is not.

Transcribed text and film subtitles are representations of the object, that could be painted on the object - "displayed as part of the representation" but can still be useful - occasionally more useful - if rendered somewhere else, or read aloud, or whatever.

There's a still purely presentation-semantics distinction between "displayed as part of the representation" and "render this in annotation space, on top of the content before it". A distinction that captioning and transcribing capture. A client is not doing the right thing if it fails to display the Biblissima cut-outs in the correct place, but it would be reasonable for it to move the transcriptions off to the side if I asked it to, or read them to me, and it would be reasonable for captions to overflow their bounding boxes, or if targeting the whole canvas to appear as surtitles or subtitles outside of the canvas bounds depending on UI choices of the client and/or user.

However, it would not be acceptable for a client to render the text in the Fire example off of the canvas, just because it happens to be text. Even in purely presentation terms, they are different kinds of painting.

barmintor commented 7 years ago

We're mulling this in a related project, in which we intend to indicate in a IIIF-AV manifest editable transcript (possible in one of several formats) and descriptive annotations for segments. Combinations of type and format are not really up to the job of clarifying to the client what these are for, especially if we are using (for example) a IIIF range as the "canonical" serialization for captions that might also be available as WebVTT or SRT.

azaroth42 commented 6 years ago

I agree with @tomcrane that the distinction to capture is not the semantics of a "caption" or a "transcription" or an "edition" or a "translation" but the presentational requirements or desired behavior. A second motivation that captures the notion of "this is a representation of the information in the canvas, rather than a representation of that part of the canvas" would then allow clients to have sufficient information to know the most appropriate display for the user. The target area should be associated with the content, but the requirement is not necessarily to paint it in exactly that spot, like it is for painting.

It's not describing, as that would be a description of the canvas area rather than the information in the canvas area. I don't think we want to include the domain-specific semantics that @jeffreycwitt suggests. It's not just highlighting, as the content needs to be rendered somewhere and could be rendered in the target area but need not be. If it MUST be rendered there, then the motivation is painting.

The consistent aspect between closed captions for a video, the lyrics of an audio track, and the transcription of hand-written text is that the body is always textual. Having an audio "transcription" of an audio track would be entirely pointless. Generically, the text is making the non-textual version more accessible to either a machine or a human, as text can be processed in more ways than audio, image, video or data.

Proposal: Add a second motivation of transcribing (c.f. https://www.merriam-webster.com/dictionary/transcribe) with the above presentational semantics.

tomcrane commented 6 years ago

:+1:

Applying this to the use case raised by @jronallo:

Client software needs to present caption controls in a clearly identifiable piece of user interface, to meet accessibility requirements, so that the user can quickly enable captions in their preferred language.

A viewer playing a video can assume that for video content, likely candidates for the "captions" menu will be identified by a transcribing motivation, so only those anno lists are grouped under that UI.

Further extra-spec refining motivations could still be used, for specific communities and in specific software, but I think this use of transcribing meets the general presentation requirement.

zimeon commented 6 years ago

:+1: on last two comments (and still to the general idea of IIIF extensions in a community controlled namespace -- I feel sure we'll be back discussing another one sometime...)

jronallo commented 6 years ago

What if there are cases where a video has embedded text (burned in like a news broadcast segment title) and captions for accessibility for the audio/speech? Those could both be a transcribing motivation but would be different in intent and presentation. Or would the case where you're "transcribing" text from within a video not be a transcribing motivation?

azaroth42 commented 6 years ago

I think that would be transcribing in the same way that the text is similarly embedded in a printed page, and the annotation is making it accessible by providing the text in a more machine and human processable form

jronallo commented 6 years ago

But then you can't make the assumption that for a video transcribing means captions (as @tomcrane suggests in https://github.com/IIIF/api/issues/1258#issuecomment-345679304 ). transcribing could just as well mean text within the video which shouldn't be provided as captions.

A viewer playing a video can assume that for video content, likely candidates for the "captions" menu will be identified by a transcribing motivation, so only those anno lists are grouped under that UI.

I guess any annotations with transcribing could still be "likely candidates" for a captions menu though what is a reasonable further processing algorithm to make that decision?

If there are two different kinds of transcribing it is not safe to assume that transcribing for a video means captions. Explicit is better than implicit.

azaroth42 commented 6 years ago

Yes, I agree that transcribing on a video is not necessarily captions (or subtitles, or anything else). I think "captions" and "subtitles" fall into the same bucket as "diplomatic-transcription" and "edition" -- a specific requirement with domain-specific semantics.

The transcribing gets us closer to it, in a domain-neutral way, but isn't explicitly the US notion of "captions" (noting that the UK has a different notion, and likely other countries and domains too)

azaroth42 commented 6 years ago

Discussion on 11/22 call: Seems reasonable but needs well defined extension mechanism to get the content / domain specific details.

aisaac commented 6 years ago

A quick, stupid check, as I'm really not sure I understand the discussion about English here: would using 'transcribing' in this (A/V) context prevent from using 'transcribing' in other contexts? I'm think especially of it as a motivation for annotations that represent transcriptions of text, such as in projects like http://blogs.ucl.ac.uk/transcribe-bentham/ ?

azaroth42 commented 6 years ago

Yes, that's the intent of it :) That the use of transcribing is to signal to the client that the annotation represents information that is already present in the rendering (it could be transcription of spoken language) and can be presented differently to a painting annotation.

aisaac commented 6 years ago

Thanks for the explanation, Rob! I am only half re-assured though. Is your answer hinting that everything that can be presented differently from a painting annotation will be expected to have the transcribing motivation?

azaroth42 commented 6 years ago

Editorial question: Do "painting" and "transcribing" get defined in Section 3 along with the properties? Otherwise where?

tomcrane commented 6 years ago

In the current draft, painting is used in section 5.1 before it is first properly mentioned in 5.3 (Canvas) and 5.5 (Annotations). What about a slightly expanded (but still brief) discussion of annotations earlier on, where the special IIIF motivations of painting and now transcribing could be introduced?

2.1 defines Content as one of the four basic types, then 2.2 defines Annotation as one of the additional types. Somewhere here, there could be a brief section that establishes annotation as the mechanism for linking to content, getting some of the work of 5.5 done earlier on in the spec, and establishing annotations as content as a foundation of the model. I feel lots of people miss this idea when they first encounter IIIF. By the time the reader gets to section 3, they should have some notion of this special role of annotations in the spec, with the various further mentions in 5 doing the legwork.

This may require a bit of a rewrite of 1 and 2. Section 1 needs broadening anyway for non-image use cases (although images should still feature in the majority of examples). Or it might be as simple as moving Annotation to 2.1 and discussing it and Content together.

Then, painting and transcribing could be introduced in section 3.

jronallo commented 6 years ago

Just chiming in about the importance of annotation to the model that can easily be overlooked so would be good to introduce annotation earlier for better understanding of IIIF.

azaroth42 commented 6 years ago

:+1: In 1.0 and early 2.0 days it was thought that Annotations were machinery to be hidden away from the casual user, but I think we see their centrality now after adoption and application to more than just the manuscript domain. I'll work on 1 and 2.

tomcrane commented 6 years ago

Agreement on AV call -

+1 to transcribing, but need to take general extension of motivations to a general technical IIIF call. Communities can refine motivations e.g for captioning but should be common across Avalon and other AV interests within IIIF

azaroth42 commented 6 years ago

Closed by #1351