still_picture =1 should not be a requirement for image sequences

aklemets commented 6 years ago

When HEVC is used in a HEIF image sequence, non-IDR frames are allowed. I would expect the same to be allowed with AV1.

cconcolato commented 6 years ago

I'm wondering what would be the difference then between an image sequence and a video sequence? Should it be restricted to not containing any show_frame = 0 (i.e. the equivalent of b-frames).

cconcolato commented 6 years ago

More broadly, as discussed during today's AOM Image call:

Is there a reason to have a difference between a video sequence and an image sequence (besides profiles that can be handled by brands)?
Should we simply say that an image sequence is a video sequence? There might be optimizations that can be done in decoders (e.g. memory consumption) when processing only intra frames. Maybe we should handle such limitations in profile definitions.

aklemets commented 6 years ago

As I recall, for HEVC in HEIF, there is no significant difference between an image sequence track and a video track. The timestamps for the image sequence track are “advisory”, meaning that a client is not required to strictly adhere to timing when playing the image sequence track.

Also, there are additional boxes to help the client decode P/B image frames in an image sequence track. There is a “Direct Reference” Sample Group box which specifies which other frames have to be fed to decoder first, before the desired image frame can be decoded. (The box also specifies if a frame can be decoded independently of any other frames.) These boxes are part of the base HEIF spec, so they should apply to AV1 as well.

Anders

From: Cyril Concolato notifications@github.com Sent: Wednesday, August 29, 2018 11:51:48 AM To: AOMediaCodec/av1-avif Cc: Anders Klemets; Author Subject: Re: [AOMediaCodec/av1-avif] still_picture =1 should not be a requirement for image sequences (#12)

More broadly, as discussed during today's AOM Image call:

Is there a reason to have a difference between a video sequence and an image sequence (besides profiles that can be handled by brands)?
Should we simply say that an image sequence is a video sequence? There might be optimizations that can be done in decoders (e.g. memory consumption) when processing only intra frames. Maybe we should handle such limitations in profile definitions.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAOMediaCodec%2Fav1-avif%2Fissues%2F12%23issuecomment-417064831&data=02%7C01%7C%7C63cf600de55e44397d8b08d60de07b00%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636711655112135885&sdata=Rt%2FS9NewFKYQkqaA1ChWxEhM9scq%2BIwf3H0i6hVwwcI%3D&reserved=0, or mute the threadhttps://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAaa3goGdPaQOj0zeDdt6XvZP1sbX1t0qks5uVuLEgaJpZM4WKNx-&data=02%7C01%7C%7C63cf600de55e44397d8b08d60de07b00%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636711655112135885&sdata=gUv9U9vLrjCl7mpamZCd0w1WbMh0DI%2Bbe%2BwM%2FqeDU8A%3D&reserved=0.

garysull commented 6 years ago

Agreeing with Anders here, HEIF can contain coded media things, each of which is identified as an "image", "image collection", or "image sequence" (along with audio and other stuff). Each picture is independently coded if it is an "image" or part of an "image collection", and for AVC and HEVC, "image sequences" can use inter-picture prediction if desired by the encoder. I don't see why this should be different for AV1. There could certainly be a big coding efficiency benefit for using inter-picture prediction for "image sequences" in some uses. I believe there is also no prohibition of non-output pictures or out-of-order output pictures for HEVC, and offhand I don't see a reason to prohibit them for AV1.

(I think there is some imprecision in some of the above comments about what a B frame is; in AVC and HEVC, a B frame is not entirely the same as it was in MPEG-2. The display order of a B frame in AVC and HEVC is not any different than it is for any other frames, and the number of pictures that a B frame depends on for its decoding process is also not any different than for a P frame.)

If it would be preferable for the pictures to be independently decodable in some use case, is there a reason that use case couldn't use an "image collection" rather than an "image sequence"?

agrange commented 6 years ago

It seems to me that an "image sequence" is really just a "coded video sequence" and can deploy the full capabilities of the AV1 bitstream.

It is the fact that it is wrapped up in an "image sequence track" rather than a "video track" that tells the decoder that timestamps are advisory.

How are "image sequence tracks" handled from the leveling perspective? I assume the level/tier signaled in the sequence header would be expected to be respected?

On Thu, Aug 30, 2018 at 2:48 PM, garysull notifications@github.com wrote:

Agreeing with Anders here, HEIF can contain coded media things, each of which is identified as an "image", "image collection", or "image sequence" (along with audio and other stuff). Each picture is independently coded if it is an "image" or part of an "image collection", and for AVC and HEVC, "image sequences" can use inter-picture prediction if desired by the encoder. I don't see why this should be different for AV1. There could certainly be a big coding efficiency benefit for using inter-picture prediction for "image sequences" in some uses. I believe there is also no prohibition of non-output pictures or out-of-order output pictures for HEVC, and offhand I don't see a reason to prohibit them for AV1. (I think there is some imprecision in some of the above comments about what a B frame is; in AVC and HEVC, a B frame is not entirely the same as it was in MPEG-2.)

If it would be preferable for the pictures to be independently decodable in some use case, is there a reason that use case couldn't use an "image collection" rather than an "image sequence"?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AOMediaCodec/av1-avif/issues/12#issuecomment-417477996, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQoPgVzKNEmrJB-H0HI4_B8-Yn-vvHmks5uWF2vgaJpZM4WKNx- .

cconcolato commented 6 years ago

How are "image sequence tracks" handled from the leveling perspective? I assume the level/tier signaled in the sequence header would be expected to be respected?

Yes. The question is what should be the level limit for the MIAF AV1 Baseline Profile definition? Should they be different for image sequence and video tracks? If so, on what grounds?

agrange commented 6 years ago

AV1 does not define separate levels for still pictures. Presumably an image sequence track would ignore the "real-time" aspects of the level definition, whereas the video track enforces it in full. This will make the level definition more usable and result in fewer sequences being tagged as level = 31.

On Thu, Aug 30, 2018 at 4:20 PM, Cyril Concolato notifications@github.com wrote:

How are "image sequence tracks" handled from the leveling perspective? I assume the level/tier signaled in the sequence header would be expected to be respected?

Yes. The question is what should be the level limit for the MIAF AV1 Baseline Profile definition? Should they be different for image sequence and video tracks? If so, on what grounds?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AOMediaCodec/av1-avif/issues/12#issuecomment-417498594, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQoPoaR5fr2s0n1r8HbJRqiZ4n8BcBzks5uWHMugaJpZM4WKNx- .

cconcolato commented 6 years ago

Do you mean that:

for image sequence tracks, if the decoder model is applied, only the resource availability mode matters?
and that for level 31, the decoding model must operate in resource availability mode?

agrange commented 6 years ago

No - because resource availability mode still checks that the decoder can keep up with a display rate. No such rate is defined for an image sequence track (correct?).

The intention of level 31, IIRC, was that it is a "best efforts" approach. The decoder will do its best but there are no guarantees.

On Thu, Aug 30, 2018 at 5:30 PM, Cyril Concolato notifications@github.com wrote:

Do you mean that:

for image sequence tracks, if the decoder model is applied, only the resource availability mode matters?

and that for level 31, the decoding model must operate in resource availability mode?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AOMediaCodec/av1-avif/issues/12#issuecomment-417511140, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQoPqm8CNoUZ9VMCtE5jdHCCRkuQtCoks5uWIOvgaJpZM4WKNx- .

cconcolato commented 6 years ago

Discussed during AOMI meeting.

There is no difference in AVIF between an image sequence and a video sequence. As defined in ISOBMFF/HEIF, the use of 'pict' indicates advisory timing, while the use of 'vide' track handler indicates timing is mandatory. The general section of the specification should not constrain the contents of a track. Profiles, identified by brands may do that, e.g. saying that inter-prediction is not used.

AOMediaCodec / av1-avif

still_picture =1 should not be a requirement for image sequences #12