IIIF / iiif-av

The International Image Interoperability Framework (IIIF) Audio/Visual (A/V) Technical Specification Group aims to extend to A/V the benefits of interoperability and the growing ecosystem of clients and servers that IIIF provides for images. This repository contains user stories and mockups for interoperable A/V content – contributions are welcome.
http://iiif.io/community/groups/av/
Apache License 2.0
13 stars 3 forks source link

Prototype info.jsons #50

Open jronallo opened 7 years ago

jronallo commented 7 years ago

Description

We would like to see some proposals for what folks think an info.json for A/V might look like so that we can make further decisions on what we need in an information package about a video or audio recording.

Variation(s)

Yes! Propose solutions. :smile:

Additional Background

Notes from the Hague meeting that can help to inform potentially productive directions. https://docs.google.com/document/d/1cnkOPm7rC9uKeSxorFpu004ZzJxolVLWex5NdxMKnHY/edit

jronallo commented 7 years ago

Here's an prototype of some examples what an info.json for video might look like: https://iiif-staging02.lib.ncsu.edu/iiifv/viewer

Here's one of the examples: https://iiif-staging02.lib.ncsu.edu/iiifv/pets/info.json

Note that I'm using id instead of @id which is likely to be the direction that future IIIF specifications go.

There are three properties worth taking a look at: sources, tracks, and thumbnail. In sources I've described each video source separately with information which could be useful in determining whether to display one source rather than another (and potentially switch between versions as appropriate). These may not all be properties for a video source we want to include or there might be others that we include in addition to these. The id for each source is the URL to the bitstream which would allow using the video we already publish to the web that sits on our own servers or another host (say a CDN). The type property tries to be as specific as possible with the content type and codecs of the video to allow for more exact testing of whether the browser can play the source or not. The duration is listed each time as it can vary ever so slightly for different files. Not all video formats (like webm) make it easy to get the number of frames.

While tracks like captions and subtitles might be considered most appropriate for a Presentation manifest, for some users these external tracks (WebVTT files in this case) are actually how they experience the video as fully as possible. (Some video containers have a way to include chapters and subtitles directly in the file right along with the images and audio of the video. It is just that in this case the captions and subtitles are separate files.) I've included the tracks in the info.json so that a complete/accessible, if simplified, video experience can be given with just the info.json. Note that descriptive data like labels are not included with each track as labels do seem to be more of a Presentation concern.

These tracks could also be expressed as annotations, which is likely how they would be managed and delivered for purposes of search inside. But since there are already caption files that institutions have created, I wanted a way to easily reuse them without much work. There are also certain features to captions that might not be easily included in annotations.

Finally, the thumbnail links to a particular image from the video that has been chosen as a good representative image and that could be used as a poster image. Within the thumbnail is a service for a full image service for the video. See issue https://github.com/IIIF/iiif-av/issues/54 for how still images for a video is complicated.

Thoughts on this prototype? What's missing? What won't work? What would you like to see in an info.json for a video?

zimeon commented 7 years ago

For easy ref and of a pretty printed version of @jronallo 's https://iiif-staging02.lib.ncsu.edu/iiifv/pets/info.json see https://gist.github.com/zimeon/29e829730576d188897cccb5ea8e97dd

I am in favor of the source style description of different derivatives, e.g.:

...
  "sources": [
    {
      "format": "webm",
      "height": 480,
      "width": 720,
      "size": "3360808",
      "duration": "35.627000",
      "type": "video/webm; codecs=\"vp8,vorbis\"",
      "id": "https://iiif-staging02.lib.ncsu.edu/iiifv/pets/pets-720x480.webm"
    },
...

rather that using a parameterized URI style like we do for Image API.

mcwhitaker commented 7 years ago

We (Brian Keese and I) followed @jronallo 's example for one of our videos and ended up with this:

{"_comments":["This is an avalon video"],
"@context":"http://iiif.io/api/video/0/context.json",
"id":"https://iiif.dlib.indiana.edu/avalon/3140",
"profile":"http://iiif.io/api/video/0/level0.json",
"attribution":"Statement of Responsibility here",
"sources":[
          {"id":"http://mallorn.dlib.indiana.edu/avalon/lunchroom_manners_512kb.mp4",
            "width":480,
            "height":360,
            "duration":"572034",
            "video_bitrate":"1000000.0",
            "audio_bitrate":"96000.0",
            "type":"video/mp4; codecs=\"avc1.42E01E,mp4a.40.240\"",
            "format":"mp4",
            "size":"78666688",
            "frames":""}],
"tracks":[
    {"id":"https://iiif.dlib.indiana.edu/avalon/3140/lunchroom_manners_512kb.vtt",
     "kind":"captions",
     "language":"en"}],
"poster":
       {"id":"https://iiif.dlib.indiana.edu/avalon/3140/2/full/full/0/default.jpg",
        "type":"Image",
        "format":"image/jpeg",
        "width":480,
        "height":360,
        "service":{"@context":"http://iiif.io/api/image/2/context.json",
                   "id":"https://iiif.dlib.indiana.edu/avalon/poster",
                   "profile":"http://iiif.io/api/image/2/level2.json"}},
"thumbnail":
       {"id":"https://iiif.dlib.indiana.edu/avalon/3140/2/full/full/0/default.jpg",
        "type":"Image",
        "format":"image/jpeg",
        "width":160,
        "height":120,
        "service":{"@context":"http://iiif.io/api/image/2/context.json",
                   "id":"https://iiif.dlib.indiana.edu/avalon/thumbnail",
                   "profile":"http://iiif.io/api/image/2/level2.json"}}}

The fields that we felt the need to add had to do with bitrate. We added video_bitrate and audio_bitrate, but just as in codecs we use "video,audio", with bitrate I suppose we can do the same. I don't know a reason to prefer one over the other.

Eyal-R commented 7 years ago

I think bitrate can be dynamic, so better not include it in the info.json.

I propose some changes, including:

{

"_comments":[
    "Mockup of info.json for IxIF, based on https://github.com/IIIF/iiif-av/issues/50."
],
"@context":"http://iiif.io/api/video/0/context.json",
"id":"https://iiif-staging02.lib.ncsu.edu/iiifv/pets",
"attribution":"Brought to you by NCSU Libraries",
"license":"Public domain",
"logo":"http://iiif.lib.ncsu.edu/iiifv/logo/full/full/0/default.jpg",

"sources":[
    {
        "id":"https://iiif-staging02.lib.ncsu.edu/iiifv/pets/pets-720x480.webm",
        "width":720,
        "height":480,
        "duration":"35.627000",
        "type":"video/webm; codecs="vp8,vorbis"",
        "format":"webm",
        "size":"3360808"
    },
    {
        "id":"https://iiif-staging02.lib.ncsu.edu/iiifv/pets/pets-720x480.mp4",
        "width":720,
        "height":480,
        "duration":"35.627000",
        "type":"video/mp4; codecs="avc1.42E01E,mp4a.40.2"",
        "format":"mp4",
        "size":"2924836",
        "frames":"1067"
    },
    {
        "id":"https://iiif-staging02.lib.ncsu.edu/iiifv/pets/pets-360x240.mp4",
        "width":360,
        "height":240,
        "duration":"35.648000",
        "type":"video/mp4; codecs="avc1.64000D,mp4a.40.2"",
        "format":"mp4",
        "size":"1075972",
        "frames":"1067"
    }
],
"tracks":[
    {
        "id":"https://iiif-staging02.lib.ncsu.edu/iiifv/pets/track/captions-en.vtt",
        "class":"captions",
        "language":"en"
    },
    {
        "id":"https://iiif-staging02.lib.ncsu.edu/iiifv/pets/track/subtitles-nl.vtt",
        "class":"subtitles",
        "language":"nl"
    }
],
"thumbnail":{
    "id":"https://iiif-staging02.lib.ncsu.edu/iiifvi/pets/2/full/full/0/default.jpg",
    "type":"Image",
    "format":"image/jpeg",
    "width":720,
    "height":480,
    "service":{
        "@context":"http://iiif.io/api/image/2/context.json",
        "id":"https://iiif-staging02.lib.ncsu.edu/iiifvi/pets",
        "profile":"http://iiif.io/api/image/2/level2.json"
    }
}
"profile" : [
"http://iiif.io/api/video/0/level0.json",
    {
      "formats" : [ "mp4", "webm" ],
      "qualities" : [ "color", "gray" ],
      "supports" : [
        "rotationArbitrary", "jumpToPointInTime"
  ]
}

}

I welcome your feedback.

azaroth42 commented 7 years ago

URI pattern:

http://server/prefix/identifier/timeRegion/spaceRegion/timeSize/spaceSize/rotation/quality.format

Though spaceRegion (I want this region of the visual part), timeSize (I want it sped up/slowed down to take this long) and rotation could easily be dropped, consistency and future proofing would lead me to include them with filler values such as "max")

{
  "@context": "http://iiif.io/api/av/1/context.json",
  "id": "http://example.edu/iiif/identifier",
  "attribution": "",
  "license": "",
  "logo": "",

  "height": 960,
  "width": 1440,
  "duration": 35.6,

  "sizes": [{"width": 720, "height": 480}, {"width": 360, "height": 240}]

  "profile": [
     "http://iiif.io/api/av/1/level0.json",
     {
        "maxWidth": 720,
        "formats": [
           {"value": "webm", "contentType": 'video/webm; codecs="vp8,vorbis"'},
           {"value": "mp4", "contentType": 'video/mp4; codecs="avc1.64000D,mp4a.40.2"'}
        ],
        "qualities": ["color", "gray"]
    }
  ],

  "seeAlso": {
    "id": "http://example.edu/nonIIIF/vtt/identifier.vtt",
    "format": "application/webvtt",
  }
}

With no additional supports features, such as "arbitrarySize", "arbitraryRegion", etc. would then allow the construction of the URIs:

http://example.edu/iiif/identifier/max/max/max/max/0/color.webm http://example.edu/iiif/identifier/max/max/max/max/0/color.mp4 http://example.edu/iiif/identifier/max/max/max/360,240/0/color.webm http://example.edu/iiif/identifier/max/max/max/360,240/0/color.mp4

jronallo commented 7 years ago

Regarding the example given above by @azaroth42

See https://github.com/IIIF/iiif-av/issues/58 for why it is good to include the codecs parameter for content type, how it works, and why specifying only one codecs parameter probably won't work for MP4.

With those sizes and that profile does it mean that I am advertising that I have the following video available? http://example.edu/iiif/identifier/max/max/max/360,240/0/gray.mp4 And every variation of the size, format, and qualities parameters? What if I only have one gray version in one codec at a particular size?

In the future I plan on only providing webm as a fallback and in only one size. Is the above saying that I am advertising that it is available for both sizes and both qualities? How can I say that I only have the webm available in a particular size?

How would I say that I have a source which is HLS or MPEG-DASH? There could be any number of adaptive sizes and bitrates for that. They could be sizes that are not the same as the static derivatives that I provide via progressive download. Would you just list "application/dash+xml" as a format and ignore sizes? Would I always just ignore most of the parameters for MPEG-DASH since a single MPEG-DASH can include several different sizes and bitrates (as well as separate audio and video streams)?

Videos created from the same source video will have slightly different durations. Each source may have a slightly different duration. Is that important to know?

Is there a use case for video qualities like gray? The only use cases I see that require this form of server API are download ones and they're for time and region segments. Are there others? https://github.com/IIIF/iiif-av/issues/4 https://github.com/IIIF/iiif-av/issues/7

azaroth42 commented 7 years ago

Following from discussion on today's call, and @tomcrane's document:

{
  "@context": "http://iiif.io/api/presentation/2/context.json",
  "id": "http://example.org/video/canvas/1",
  "type": "Canvas",
  "height": 480,
  "width": 720,
  "duration": 35.63,
  "attribution": "Brought to you by NCSU Libraries",
  "thumbnail": {
    "id": "https://iiif-staging02.lib.ncsu.edu/iiifvi/pets/2/full/full/0/default.jpg",
    "type": "Image",
    "format": "image/jpeg",
    "service": {
      "@context": "http://iiif.io/api/image/2/context.json",
      "id": "https://iiif-staging02.lib.ncsu.edu/iiifvi/pets",
      "profile": "http://iiif.io/api/image/2/level2.json"
    }
  },
  "resources": {
    "type": "AnnotationList",
    "items": [
      {
        "type": "Annotation",
        "body": {
           "type": "Choice",
           "items": [
              {
                "id": "https://iiif-staging02.lib.ncsu.edu/iiifv/pets/pets-720x480.mp4",
                "duration": 35.627000,
                "format": "video/mp4; codecs=\"avc1.42E01E,mp4a.40.2\"",
                "width": 720,
                "height": 480
              },
              // ... other video representations here
            ],
        },
       "target": "http://example.org/video/canvas/1"
      }
    ]
  },
  "otherContent": [
    {
      "id": "http://uri-for-subtitles-as-annotations.json",
      "type": "AnnotationList",
      "service/seeAlso?": {
         "id": "https://iiif-staging02.lib.ncsu.edu/iiifv/pets/pets-captions-en.vtt",
         "format": "text/web-vtt or whatever it is",
         "language": "en"
      }
    }
  ]      
}

And then the transformation of the example .vtt into annotations is very easy, with media fragments on the canvas uri for the temporal regions, but the original is linked from the list for systems that can use it.

azaroth42 commented 7 years ago

And the code to turn VTT into annotations is trivial:

from pyvtt import WebVTTFile
import json
vtt = WebVTTFile.open("pets-en.vtt")
canvas = "http://example.org/canvas/1"
annos = []
for caption in vtt:
    tgt = "%s#t=npt:%s,%s" % (canvas, caption.start, caption.end)
    annos.append({"@type": "Annotation", "motivation": "painting", "body": {"value": caption.text}, "target": tgt})
al = {"@type": "AnnotationList", "items": annos}
print json.dumps(al, sort_keys=True, indent=4)
Eyal-R commented 7 years ago

I think it's essential to include the Rights and Licensing fields, as in Image API's info.json ( http://iiif.io/api/image/2.1/#rights-and-licensing-properties ) as demonstrated in the example I've included in this thread.

azaroth42 commented 7 years ago

@Eyal-R Yes, missed that, sorry! It would work exactly as it does in the presentation API we have today, so exactly as per the examples above. Edited the example to add in the attribution from Jason's example.