Open jronallo opened 8 years ago
By @thehabes
When referring to Audio pieces within the manifest, it was easy for our annotations to be somewhere "on" a piece of audio. We stored audio resources as an oa:Annotation and made the resource of that annotation the actual mp3 audio file, which made it very easy to connection annotations and annotation lists to.
It would be nice if a/v resources could fit into the specs this way. Making them an annotation was a way to cheat. If something existed like sc:Sound that worked like an sc:Canvas in the specs, it would work rather smoothly (see the example below on way we did this to make it work).
So the main point of discussion that comes from this is how exactly would an audio/video resource be described and treated? What would be a proper motivation and @type? What's the consistent and proper way to talk about how an annotation is "on" a resource with the dimension of time (and remember, time is probably an interval not an exact moment).
Ex:
//The audio resource { "@id": "/some/audioResource", "@type": "oa:Annotation", //this was the hacky part, we would like this to be sc:Sound or something "label": "sound file", "motivation": "performance", //what should this be? "resource": { "@id": "media/audio/audioFile.mp3", "@type": "dctypes:Sound", "format": "audio/mpeg" } }
{ "@id": /some/audio/annotation/ID, "@type": "oa:Annotation", "label": "first five seconds", "on":[ "/some/audioResource#t=0,5.00"]
}
For the variation aspect of this, we box an area of a sheet music canvas when the notes are playing. So when the first five seconds of the music notes are boxed by this annotation, all we had to do was let this annotation know it was "on" two resources at once:
{ "@id": /some/audio/annotation/ID, "@type": "oa:Annotation", "label": "first five seconds", "on":[ "/some/audioResource#t=0,5.00", "/some/music/sheet#xywh=546,485,186,382"] }
So when the canvas was loaded to the screen, so was the music. Since both resources were active, when this annotation was hit, it knew to be drawn to the screen given the dimensions during a specific TIME interval the music was playing. If the canvas is clicked on, the annotation knows to load the audio to a specific time. If the audio is loaded to a specific time, it knows to make a specific drawn annotation active, and in this way we connected audio, time and drawn visuals.
I don't follow the use case @thehabes is describing. Is it to annotate Audio at a certain time range, or is it to annotate (a certain time range of) audio onto a canvas (at a certain time range)?
Either are possible, but we should distinguish the different axes.
I think this one was a little of both, so maybe it should be refined.... { "@id": "/some/audioResource", "@type": "oa:Annotation", //this was the hacky part, we would like this to be sc:Sound or something "label": "sound file", "motivation": "performance", //what should this be? "resource": { "@id": "media/audio/audioFile.mp3", "@type": "dctypes:Sound", "format": "audio/mpeg" } }
{ "@id": /some/audio/annotation/ID, "@type": "oa:Annotation", "label": "first five seconds", "on":[ "/some/audioResource#t=0,5.00"]
} This resource and annotation represent annotating audio at a certain time range (which could also just be a certain point int time). In this case, I am trying to say something about the first 5 seconds of this audio resource (although all I have here is a label).
The canvas drawing is introduced in the next iteration of the same annotation... { "@id": /some/audio/annotation/ID, "@type": "oa:Annotation", "label": "first five seconds", "on":[ "/some/audioResource#t=0,5.00", "/some/music/sheet#xywh=546,485,186,382"] } I am trying to say something about the first 5 seconds of this audio resource, and during those five seconds, I know to draw this box onto the canvas, which is a variation of the use case. However, the knowledge ahead of time that we wanted to do such a thing determined how we referred to time with an annotation, and maybe that knowledge can help here. I definitely understand splitting them up.
Yup. It would be good to stick to clear descriptions of the use cases at this stage, and then discuss solutions, rather than jumping to solutions for half-understood (by the group) problems.
Use case 1:
I am trying to say something about the first 5 seconds of this audio content
Use Case 2:
During those five seconds, draw something
The conversation seems to be around selectors; I think the use cases overlap there. If the title of the issue were "Refer to a point in time of A/V resource" it would be clear this is the case. I think there may be multiple applications of the solution of this issue, but I don't think the solution is forking as much as the discussion, which may result in a cookbook full of ways to use the selector in different cases.
The resource is defined (following a IIIF-y container scheme):
{
"@id": "http://example.org/av/005",
"@type": "av:Audio", // parallel sc:Canvas.
"label": "sound file",
"motivation": "performance",// IIIF A/V can pick what this is
"duration": 45.09, // IIIF A/V may require this, like sc:Canvas.height
"recordings": [{ // parallel .images
"resource": { // It could just be this in the simplest case...
"@id": "http://example.org/audio/audioFile.mp3",
"@type": "dctypes:Sound",
"format": "audio/mpeg"
}]
}
}
and then you just need a selector to point to it:
{
"@id": "http://example.org/annotation/1250",
"@type": "oa:Annotation",
"label": "five seconds",
"description": "The first 5 seconds pointed at by the last 5",
"on": "http://example.org/av/005#t=0,5",
"resource": "http://example.org/av/005#t=40.09,45.09"
}
When on
is used it means oa:hasTarget
, but the #t=
selector is just a shortcut for OAC Fragment Selectors, so resource
(which is oa:hasBody
in context), is just as willing to accept a fragment. (note: OAC has a context that calls on
and resource
simply target
and body
, respectively.)
This example is probably nonsense (annotating a resource onto itself), but I have a real use case that draws measures onto manuscripts and connects those regions to the audio fragments. In that case, I use an array "on" : [ "MS#xywh=546,485,186,382", "music#t=5.53,10.2" ]
since the intent is not to point from one to the other, but to align the two. Best practice may emerge to suggest these are both resources
, or that a specific motivation
should be used, but I think this supports the intent of OAC.
To the original intent of the post, I would consider the various A/V resources as independent from the IIIF Manifest, which is intended to arrange images. Because annotations can link resources together reliably, what may be most important is a well-described and annotatable resource, like sc:Canvas
that standardizes the resource to enable reliable annotation, even when the underlying resource is lost or changed.
References between multiple representations: image, audio, video, music notation.
With good resources for each, an XPath selector for MEI and an xywh region in a canvas could be annotated onto a t for an audio resource and a t, xywh of a video without ruffling any feathers.
As a student, I want to cite a particular time range of a video for my paper.
This gets into the more interesting bits about how to encapsulate the video—that may be a long discussion. IIIF ignores image format and resolution by forcing a ratio no matter what. The t= in audio marking makes the same dodge around specific resolution or versions. In video, there may be a need for a hybrid or divergent description of objects, since most online resources for exhibit will be compressed videos (I assume in ignorance), and t+xywh is enough, but a specific reference to a direct frame is more similar to a measure or note reference in MEI, so perhaps a resource that produces the video is different from the one that describes the film.
This does indeed, as Rob suggests, create multiple use cases under this issue. If this issue is proliferating stories, they may be best captured somewhere else. Insofar as it asks for clear protocol on how to refer to a fragment, I think the simple answer is "Use a selector." This answer defers the definition of what resource this selector may target
. Integration with Mirador or similar viewers may ultimately drive that answer.
Earlier comments in this issue point to need for canonical form of selector values, e.g. use t=5
and not t=5.00
.
Previous comment intended to say we should have a canonical "no trailing zeros", and not to limit to integer seconds. To give perhaps a better example, use t=5.123
and not t=5.1230
.
I think it's useful to think about point+duration here as the way to define this.
I'd love to be able to say "I want this moment, with duration 0, as a JPEG."
Description
Ability to refer to a point in time. Standard syntax for addressing points. Want to say this chord at this point. SMPTE time codes. Might also need sample level access on audio files -- this 10 sample window. Frequency content of the window. SMPTE, Samples and Time. HTML5 video not great at supporting sample accuracy, get nearest key frame based on time. Could annotate with sample accuracy, but just works as metadata.
Variation(s)
Additional Background
Related IIIF: Image API Region, info.json, # Fragment on Canvases Media fragments support NPT, SMPTE (https://www.w3.org/TR/media-frags/#naming-time) and note no problem combining t=123.45&xywh=1,2,3,4 (https://www.w3.org/TR/media-frags/#processing-name-value-lists ) Media fragment spec does not specify mandatory or canonical order of parameters. If using in IIIF we’d probably want to mandate (or at least recommend a canonical order) to avoid the creation of duplicate URIs for the same thing.
Film folks need frame specific references. Can build a more advanced player that supports it. Frames would need to be relative to original scan of the film. Transcoding gets … "exciting". e.g. freeze on a frame where the original physical film has a scratch. Instead of referring to a point in film by time, we can do it by frame as well
SMPTE allows reference to a frame in the format: hh:mm:ss:ff eg: 00:01:30:02 is the third frame in 1 minute 30 seconds. (SMPTE spec is closed, but https://en.wikipedia.org/wiki/SMPTE_timecode describes)
Source: BL workshop notes Interest: 100%
Use Cases