Closed tomcrane closed 6 years ago
This could also be used by a viewer that is capable of rendering WebVTT to tell that some of the content annotations are captions, and it should not render the WebVTT annotations (see https://raw.githubusercontent.com/IIIF/iiif-av/master/source/api/av/examples/11a.json).
This seems to be a good example of a larger issue where clients should be expected to display different "kinds" of texts in different way. A generic "painting" motivation doesn't seem to be enough to alert a client to these difference.
To the "captioning" motivations, others like "normalized-transcription", "diplomatic-transcription", and "translation" could be added. I suspect there are others. A client might want to be able to create different views for these different "kinds" of texts.
In the past I've been handling this with Layers, but it would be best to handle this in a way that is consistent with what the A/V groups is doing.
In short these seems like an issue bigger than A/V. Is it possible to add a more general github label, so that this issue can be discussed as part of a larger discussion of IIIF supported motivations?
Note relation to #511
Sense in Toronto discussion is that we want to allow refining motivations without adding an ever increasing number of motivations to the Presentation API specification. Thus, we might add a captioning
motivation in another namespace (community controlled list, say iiifc:
prefix for illustration, the prefix would be defined in presentation context) but this would be in addition to the painting
motivation (that is in the spec), and would thus also support a local addition ext:special-ing
in a set of motivations:
"motivation": ["painting", "iiifc:captioning", "ext:special-ing"]
Would need to understand process for consideration of how to add other motivations to the community namespace.
The more I think about this the less sure I am that captioning
- the suggestion example that leads to these issues - is a subclass of painting
. It doesn't have to be, in the list suggestion above. Many clients will render such annotations on top of the image or video, and often the annotation will target fragments of the time dimension only. But the intent of captioning
can be met by other means that would still satisfy the publisher of those captions.
Here's what the spec says about painting
:
Note that all resources which are to be displayed as part of the representation are given the motivation of “sc:painting”, regardless of whether they are images or not. For example, a transcription of the text in a page is considered “painting” as it is a representation of the object, whereas a comment about the page is not.
Transcribed text and film subtitles are representations of the object, that could be painted on the object - "displayed as part of the representation" but can still be useful - occasionally more useful - if rendered somewhere else, or read aloud, or whatever.
There's a still purely presentation-semantics distinction between "displayed as part of the representation" and "render this in annotation space, on top of the content before it". A distinction that captioning
and transcribing
capture. A client is not doing the right thing if it fails to display the Biblissima cut-outs in the correct place, but it would be reasonable for it to move the transcriptions off to the side if I asked it to, or read them to me, and it would be reasonable for captions to overflow their bounding boxes, or if targeting the whole canvas to appear as surtitles or subtitles outside of the canvas bounds depending on UI choices of the client and/or user.
However, it would not be acceptable for a client to render the text in the Fire example off of the canvas, just because it happens to be text. Even in purely presentation terms, they are different kinds of painting
.
We're mulling this in a related project, in which we intend to indicate in a IIIF-AV manifest editable transcript (possible in one of several formats) and descriptive annotations for segments. Combinations of type
and format
are not really up to the job of clarifying to the client what these are for, especially if we are using (for example) a IIIF range as the "canonical" serialization for captions that might also be available as WebVTT or SRT.
I agree with @tomcrane that the distinction to capture is not the semantics of a "caption" or a "transcription" or an "edition" or a "translation" but the presentational requirements or desired behavior. A second motivation that captures the notion of "this is a representation of the information in the canvas, rather than a representation of that part of the canvas" would then allow clients to have sufficient information to know the most appropriate display for the user. The target area should be associated with the content, but the requirement is not necessarily to paint it in exactly that spot, like it is for painting.
It's not describing
, as that would be a description of the canvas area rather than the information in the canvas area. I don't think we want to include the domain-specific semantics that @jeffreycwitt suggests. It's not just highlighting
, as the content needs to be rendered somewhere and could be rendered in the target area but need not be. If it MUST be rendered there, then the motivation is painting
.
The consistent aspect between closed captions for a video, the lyrics of an audio track, and the transcription of hand-written text is that the body is always textual. Having an audio "transcription" of an audio track would be entirely pointless. Generically, the text is making the non-textual version more accessible to either a machine or a human, as text can be processed in more ways than audio, image, video or data.
Proposal: Add a second motivation of transcribing
(c.f. https://www.merriam-webster.com/dictionary/transcribe) with the above presentational semantics.
:+1:
Applying this to the use case raised by @jronallo:
Client software needs to present caption controls in a clearly identifiable piece of user interface, to meet accessibility requirements, so that the user can quickly enable captions in their preferred language.
A viewer playing a video can assume that for video content, likely candidates for the "captions" menu will be identified by a transcribing
motivation, so only those anno lists are grouped under that UI.
Further extra-spec refining motivations could still be used, for specific communities and in specific software, but I think this use of transcribing
meets the general presentation requirement.
:+1: on last two comments (and still to the general idea of IIIF extensions in a community controlled namespace -- I feel sure we'll be back discussing another one sometime...)
What if there are cases where a video has embedded text (burned in like a news broadcast segment title) and captions for accessibility for the audio/speech? Those could both be a transcribing
motivation but would be different in intent and presentation. Or would the case where you're "transcribing" text from within a video not be a transcribing
motivation?
I think that would be transcribing
in the same way that the text is similarly embedded in a printed page, and the annotation is making it accessible by providing the text in a more machine and human processable form
But then you can't make the assumption that for a video transcribing
means captions (as @tomcrane suggests in https://github.com/IIIF/api/issues/1258#issuecomment-345679304 ). transcribing
could just as well mean text within the video which shouldn't be provided as captions.
A viewer playing a video can assume that for video content, likely candidates for the "captions" menu will be identified by a transcribing motivation, so only those anno lists are grouped under that UI.
I guess any annotations with transcribing
could still be "likely candidates" for a captions menu though what is a reasonable further processing algorithm to make that decision?
If there are two different kinds of transcribing
it is not safe to assume that transcribing
for a video means captions. Explicit is better than implicit.
Yes, I agree that transcribing
on a video is not necessarily captions (or subtitles, or anything else). I think "captions" and "subtitles" fall into the same bucket as "diplomatic-transcription" and "edition" -- a specific requirement with domain-specific semantics.
The transcribing
gets us closer to it, in a domain-neutral way, but isn't explicitly the US notion of "captions" (noting that the UK has a different notion, and likely other countries and domains too)
Discussion on 11/22 call: Seems reasonable but needs well defined extension mechanism to get the content / domain specific details.
A quick, stupid check, as I'm really not sure I understand the discussion about English here: would using 'transcribing' in this (A/V) context prevent from using 'transcribing' in other contexts? I'm think especially of it as a motivation for annotations that represent transcriptions of text, such as in projects like http://blogs.ucl.ac.uk/transcribe-bentham/ ?
Yes, that's the intent of it :) That the use of transcribing
is to signal to the client that the annotation represents information that is already present in the rendering (it could be transcription of spoken language) and can be presented differently to a painting
annotation.
Thanks for the explanation, Rob! I am only half re-assured though. Is your answer hinting that everything that can be presented differently from a painting annotation will be expected to have the transcribing motivation?
Editorial question: Do "painting" and "transcribing" get defined in Section 3 along with the properties? Otherwise where?
In the current draft, painting
is used in section 5.1 before it is first properly mentioned in 5.3 (Canvas) and 5.5 (Annotations). What about a slightly expanded (but still brief) discussion of annotations earlier on, where the special IIIF motivations of painting and now transcribing could be introduced?
2.1 defines Content as one of the four basic types, then 2.2 defines Annotation as one of the additional types. Somewhere here, there could be a brief section that establishes annotation as the mechanism for linking to content, getting some of the work of 5.5 done earlier on in the spec, and establishing annotations as content as a foundation of the model. I feel lots of people miss this idea when they first encounter IIIF. By the time the reader gets to section 3, they should have some notion of this special role of annotations in the spec, with the various further mentions in 5 doing the legwork.
This may require a bit of a rewrite of 1 and 2. Section 1 needs broadening anyway for non-image use cases (although images should still feature in the majority of examples). Or it might be as simple as moving Annotation to 2.1 and discussing it and Content together.
Then, painting and transcribing could be introduced in section 3.
Just chiming in about the importance of annotation to the model that can easily be overlooked so would be good to introduce annotation earlier for better understanding of IIIF.
:+1: In 1.0 and early 2.0 days it was thought that Annotations were machinery to be hidden away from the casual user, but I think we see their centrality now after adoption and application to more than just the manuscript domain. I'll work on 1 and 2.
Agreement on AV call -
+1 to transcribing, but need to take general extension of motivations to a general technical IIIF call. Communities can refine motivations e.g for captioning but should be common across Avalon and other AV interests within IIIF
Closed by #1351
From AV Working Group:
Although a client might render captions (e.g., on a video) in the same way as other painting motivation annotations, a specific
captioning
motivation would allow the client to present UI to the user to choose captions, e.g., for compliance with accessibility requirements, and for a better user experience. This is a common feature of video players that a user would expect to see.