IIIF / iiif-av

The International Image Interoperability Framework (IIIF) Audio/Visual (A/V) Technical Specification Group aims to extend to A/V the benefits of interoperability and the growing ecosystem of clients and servers that IIIF provides for images. This repository contains user stories and mockups for interoperable A/V content – contributions are welcome.

http://iiif.io/community/groups/av/

Apache License 2.0

13 stars 3 forks source link

Refer to a point or range in time of the content #11

Open jronallo opened 8 years ago

jronallo commented 8 years ago

Description

Ability to refer to a point in time. Standard syntax for addressing points. Want to say this chord at this point. SMPTE time codes. Might also need sample level access on audio files -- this 10 sample window. Frequency content of the window. SMPTE, Samples and Time. HTML5 video not great at supporting sample accuracy, get nearest key frame based on time. Could annotate with sample accuracy, but just works as metadata.

Variation(s)

References between multiple representations: image, audio, video, music notation.
As a student, I want to cite a particular time range of a video for my paper.
Segmentation of video, audio by time. Play range of time from video/audio Variation: Limiting client to playing a certain range (e.g. avoid students being distracted by whole movie when you want them to see a segment for a course)
Additional Background

Related IIIF: Image API Region, info.json, # Fragment on Canvases Media fragments support NPT, SMPTE (https://www.w3.org/TR/media-frags/#naming-time) and note no problem combining t=123.45&xywh=1,2,3,4 (https://www.w3.org/TR/media-frags/#processing-name-value-lists ) Media fragment spec does not specify mandatory or canonical order of parameters. If using in IIIF we’d probably want to mandate (or at least recommend a canonical order) to avoid the creation of duplicate URIs for the same thing.

Film folks need frame specific references. Can build a more advanced player that supports it. Frames would need to be relative to original scan of the film. Transcoding gets … "exciting". e.g. freeze on a frame where the original physical film has a scratch. Instead of referring to a point in film by time, we can do it by frame as well

SMPTE allows reference to a frame in the format: hh:mm:ss:ff eg: 00:01:30:02 is the third frame in 1 minute 30 seconds. (SMPTE spec is closed, but https://en.wikipedia.org/wiki/SMPTE_timecode describes)

Source: BL workshop notes Interest: 100%

Use Cases

I have an A/V resources. I would like to say something about the first five and a half seconds of that resource.
I have an A/V resources. I would like to say something about what happens at exactly second 5.5
I have an A/V resources. During the first five and a half seconds of that resource I want to draw an annotation, perhaps on the same resource, perhaps another (such as a canvas).

jronallo commented 8 years ago

By @thehabes

When referring to Audio pieces within the manifest, it was easy for our annotations to be somewhere "on" a piece of audio. We stored audio resources as an oa:Annotation and made the resource of that annotation the actual mp3 audio file, which made it very easy to connection annotations and annotation lists to.

It would be nice if a/v resources could fit into the specs this way. Making them an annotation was a way to cheat. If something existed like sc:Sound that worked like an sc:Canvas in the specs, it would work rather smoothly (see the example below on way we did this to make it work).

So the main point of discussion that comes from this is how exactly would an audio/video resource be described and treated? What would be a proper motivation and @type? What's the consistent and proper way to talk about how an annotation is "on" a resource with the dimension of time (and remember, time is probably an interval not an exact moment).

Ex:

//The audio resource { "@id": "/some/audioResource", "@type": "oa:Annotation", //this was the hacky part, we would like this to be sc:Sound or something "label": "sound file", "motivation": "performance", //what should this be? "resource": { "@id": "media/audio/audioFile.mp3", "@type": "dctypes:Sound", "format": "audio/mpeg" } }

{ "@id": /some/audio/annotation/ID, "@type": "oa:Annotation", "label": "first five seconds", "on":[ "/some/audioResource#t=0,5.00"]

}

For the variation aspect of this, we box an area of a sheet music canvas when the notes are playing. So when the first five seconds of the music notes are boxed by this annotation, all we had to do was let this annotation know it was "on" two resources at once:

{ "@id": /some/audio/annotation/ID, "@type": "oa:Annotation", "label": "first five seconds", "on":[ "/some/audioResource#t=0,5.00", "/some/music/sheet#xywh=546,485,186,382"] }

So when the canvas was loaded to the screen, so was the music. Since both resources were active, when this annotation was hit, it knew to be drawn to the screen given the dimensions during a specific TIME interval the music was playing. If the canvas is clicked on, the annotation knows to load the audio to a specific time. If the audio is loaded to a specific time, it knows to make a specific drawn annotation active, and in this way we connected audio, time and drawn visuals.

azaroth42 commented 8 years ago

I don't follow the use case @thehabes is describing. Is it to annotate Audio at a certain time range, or is it to annotate (a certain time range of) audio onto a canvas (at a certain time range)?

Either are possible, but we should distinguish the different axes.

thehabes commented 8 years ago

I think this one was a little of both, so maybe it should be refined.... { "@id": "/some/audioResource", "@type": "oa:Annotation", //this was the hacky part, we would like this to be sc:Sound or something "label": "sound file", "motivation": "performance", //what should this be? "resource": { "@id": "media/audio/audioFile.mp3", "@type": "dctypes:Sound", "format": "audio/mpeg" } }

{ "@id": /some/audio/annotation/ID, "@type": "oa:Annotation", "label": "first five seconds", "on":[ "/some/audioResource#t=0,5.00"]

} This resource and annotation represent annotating audio at a certain time range (which could also just be a certain point int time). In this case, I am trying to say something about the first 5 seconds of this audio resource (although all I have here is a label).

The canvas drawing is introduced in the next iteration of the same annotation... { "@id": /some/audio/annotation/ID, "@type": "oa:Annotation", "label": "first five seconds", "on":[ "/some/audioResource#t=0,5.00", "/some/music/sheet#xywh=546,485,186,382"] } I am trying to say something about the first 5 seconds of this audio resource, and during those five seconds, I know to draw this box onto the canvas, which is a variation of the use case. However, the knowledge ahead of time that we wanted to do such a thing determined how we referred to time with an annotation, and maybe that knowledge can help here. I definitely understand splitting them up.

azaroth42 commented 8 years ago

Yup. It would be good to stick to clear descriptions of the use cases at this stage, and then discuss solutions, rather than jumping to solutions for half-understood (by the group) problems.

Use case 1:

I am trying to say something about the first 5 seconds of this audio content

Use Case 2:

During those five seconds, draw something

cubap commented 8 years ago

The conversation seems to be around selectors; I think the use cases overlap there. If the title of the issue were "Refer to a point in time of A/V resource" it would be clear this is the case. I think there may be multiple applications of the solution of this issue, but I don't think the solution is forking as much as the discussion, which may result in a cookbook full of ways to use the selector in different cases.

Too many details

The resource is defined (following a IIIF-y container scheme):

{
    "@id": "http://example.org/av/005",
    "@type": "av:Audio",             // parallel sc:Canvas.
    "label": "sound file",
    "motivation": "performance",// IIIF A/V can pick what this is
    "duration": 45.09,        // IIIF A/V may require this, like sc:Canvas.height
    "recordings": [{            // parallel .images
        "resource": {                // It could just be this in the simplest case...
            "@id": "http://example.org/audio/audioFile.mp3",
            "@type": "dctypes:Sound",
            "format": "audio/mpeg"
        }]
    }
}

and then you just need a selector to point to it:

{
     "@id": "http://example.org/annotation/1250",
     "@type": "oa:Annotation",
     "label": "five seconds",
     "description": "The first 5 seconds pointed at by the last 5",
     "on": "http://example.org/av/005#t=0,5",
     "resource":  "http://example.org/av/005#t=40.09,45.09"
}

When on is used it means oa:hasTarget, but the #t= selector is just a shortcut for OAC Fragment Selectors, so resource (which is oa:hasBody in context), is just as willing to accept a fragment. (note: OAC has a context that calls on and resource simply target and body, respectively.)

This example is probably nonsense (annotating a resource onto itself), but I have a real use case that draws measures onto manuscripts and connects those regions to the audio fragments. In that case, I use an array "on" : [ "MS#xywh=546,485,186,382", "music#t=5.53,10.2" ] since the intent is not to point from one to the other, but to align the two. Best practice may emerge to suggest these are both resources, or that a specific motivation should be used, but I think this supports the intent of OAC.

cubap commented 8 years ago

To the original intent of the post, I would consider the various A/V resources as independent from the IIIF Manifest, which is intended to arrange images. Because annotations can link resources together reliably, what may be most important is a well-described and annotatable resource, like sc:Canvas that standardizes the resource to enable reliable annotation, even when the underlying resource is lost or changed.

References between multiple representations: image, audio, video, music notation.

With good resources for each, an XPath selector for MEI and an xywh region in a canvas could be annotated onto a t for an audio resource and a t, xywh of a video without ruffling any feathers.

As a student, I want to cite a particular time range of a video for my paper.

This gets into the more interesting bits about how to encapsulate the video—that may be a long discussion. IIIF ignores image format and resolution by forcing a ratio no matter what. The t= in audio marking makes the same dodge around specific resolution or versions. In video, there may be a need for a hybrid or divergent description of objects, since most online resources for exhibit will be compressed videos (I assume in ignorance), and t+xywh is enough, but a specific reference to a direct frame is more similar to a measure or note reference in MEI, so perhaps a resource that produces the video is different from the one that describes the film.

This does indeed, as Rob suggests, create multiple use cases under this issue. If this issue is proliferating stories, they may be best captured somewhere else. Insofar as it asks for clear protocol on how to refer to a fragment, I think the simple answer is "Use a selector." This answer defers the definition of what resource this selector may target. Integration with Mirador or similar viewers may ultimately drive that answer.

zimeon commented 8 years ago

Earlier comments in this issue point to need for canonical form of selector values, e.g. use t=5 and not t=5.00.

zimeon commented 8 years ago

Previous comment intended to say we should have a canonical "no trailing zeros", and not to limit to integer seconds. To give perhaps a better example, use t=5.123 and not t=5.1230.

workergnome commented 7 years ago

I think it's useful to think about point+duration here as the way to define this.

I'd love to be able to say "I want this moment, with duration 0, as a JPEG."