AOMediaCodec / av1-rtp-spec

Current draft (HTML): https://aomediacodec.github.io/av1-rtp-spec/
Other
63 stars 24 forks source link

[QUESTION] How are chains and frame references used? #171

Closed murillo128 closed 3 years ago

murillo128 commented 3 years ago

I have been reading the spec for weeks now and I can't understand the usage of chains and how are they used differently than the reference frame list for deciding the decodability of a received frame, so I must have a fundamental misunderstanding.

Chains allows to detect if the previous frame on the chain has been lost, while the frame dependencies represent which frames are referenced by current frame. If I understood the later correctly, the frame cannot be decoded if not all the referenced frames has been received, so the chains plays no role in here.

Also, knowing if a frame is required by a dependency target is given by the decode target indication, so the chains are not required, here also.

So is the chains are only used for deciding if a decode target is no longer decodable and request a LRR/Iframe to restart it and switch to another valid decode target in the mean while?

DanilChapovalov commented 3 years ago

https://aomediacodec.github.io/av1-rtp-spec/#a611-decode-targets-decode-target-indications-and-chains example at the very end tries to explain how chains help with loss: There is no switch, F5 by itself is decodable (because F1 is received, i.e. all referenced framed are received) yet, in some scenarios F5 shouldn't be decoded yet.

frame references describe decodability of one current frame. chain describes (minimalistic) decodability of the decode target, of the future frames, specially how useful are the past frames that were not received (dtis describe how frame is used for DT in more details than chains, but dtis are not available for the frames that were not received)

Yes, main purpose of the chain is to decide if a decode target is not decodable. However it can still be recovered with nack and retransmission or just by waiting for the right frame to arrive.

from practical point of view chains should be used same place where tl0picidx for H264/VP8/VP9 is used.

agrange commented 3 years ago

And purely from the AV1 bitstream perspective, if a frame is lost you have no way of knowing which reference frame slot it would have updated (unless you use frame_id, or other external signaling), so when you later use this slot for reference it may contain the wrong frame and lead to prediction errors.

On Wed, Oct 14, 2020 at 8:14 AM DanilChapovalov notifications@github.com wrote:

https://aomediacodec.github.io/av1-rtp-spec/#a611-decode-targets-decode-target-indications-and-chains example at the very end tries to explain how chains help with loss: There is no switch, F5 by itself is decodable (because F1 is received, i.e. all referenced framed are received) yet, in some scenarios F5 shouldn't be decoded yet.

frame references describe decodability of one current frame. chain describes (minimalistic) decodability of the decode target, of the future frames, specially how useful are the past frames that were not received (dtis describe how frame is used for DT in more details than chains, but dtis are not available for the frames that were not received)

Yes, main purpose of the chain is to decide if a decode target is not decodable. However it can still be recovered with nack and retransmission or just by waiting for the right frame to arrive.

from practical point of view chains should be used same place where tl0picidx for H264/VP8/VP9 is used.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AOMediaCodec/av1-rtp-spec/issues/171#issuecomment-708469926, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACCQPR7H5ILKU2KC2JDPM3SKW54NANCNFSM4SP5QGDA .

-- Adrian

mhoro commented 3 years ago

If I recall correctly, the initial reason that Danil invented chains was to satisfy Vidyo's IDD (instantaneous decidability of decodability) feature requirement. TL0PICIDX, which Danil mentioned in an earlier comment, offers the IDD capability in a less general way. As you probably know, in practice, IDD allows a system to initiate error handling processes (e.g., retransmission) much faster than systems without IDD thereby reducing glass-to-glass latency on lossy networks.

murillo128 commented 3 years ago

Thanxs for the explanations. The main problem for me was understanding what are the differences in usage for the chains and the frame references.

I think that the only question left for me is if it is possible for a frame to be non decodable (because some of the reference frames are missing) but the chain is still decodable. The frame should be discardable, the missing referenced frame should not be discardable, but not in present in the chain. I would say yes, but would like to get confirmation.

The SFU process would be then something like (without taking into account NAC/RTX which I don't have it integrated in the layer forwarding code)

DanilChapovalov commented 3 years ago

Yes, it is possible for frame to be non decodable but the chain is still decodable. e.g. in L1T3 structure when T1 frame is lost, the following T2 frame will be non-decodable, but the chain is still decodable.

Moreover, such frame doesn't have to be discardable. e.g. in L2T3 structure (see https://aomediacodec.github.io/av1-rtp-spec/#a611-decode-targets-decode-target-indications-and-chains ) imaging SFM picks maximum decode target (DT3) and receives F1, F2, F6. F6 would not be decodable (because F5 was missed), it is not discardable (because F8 refer to it), but chain is still decodable: dropping F6 would require to drop F8 too, but F9, F10 and all following frames will be decodable.

Because of that in your description in step 2 I would not recommend to request iframe/LRR even if frame is not discardable. for the step 3 when chain is broken I think it would be better to behave sligtly differently:

murillo128 commented 3 years ago

The problem is to to be able to determine if a different DT is acceptable or not. Even for the starting DT to be forwarded.

Currently, afaik, there is no information how to map the decode target to a concrete spatial or temporal layer id. All I do is traverse the list of templates that contain that DT with a non "-" dti and get the potential values of the sid/tids.

Furthermore, is it possible to have more than one DT for a <sid,lid> pair? Is it possible to differentiate the S and non-S modes just by the DD to know if I can jump to a lower temporal or spatial layer?

DanilChapovalov commented 3 years ago

There are two ways to know you can switch to a Decode target. One way uses dti: if frame is decodable and has switch indication for that Decode target - you can switch to it. Another way uses chain: if chain protecting Decode target is not broken - you can switch to it. For the scenario you've describe (chain for the current Decode Target got broken) 2nd way is more appropriate.

Yes, there is no straightforward way to map decode target to the spatial/temporal id pair. While spatial and temporal ids of individual frames are provided, they are barely used by the dependency descriptor. It is allowed for two different DecodeTargets to map to the same <sid, tid> pair. It is not possible to differentiate the S and non-S modes. It can also be a mix of those two (e.g. S2 frames do not refer S1 frames, by S1 frames do refer S0 frames). But you do not need to know the mode to decide if you can jump to another decode target. As I understand you are asking about one-way jump to a different decode target (i.e. you are not worry that switching back require keyframe or irregular upswitch) If you are interested in one-way jump to a different decode target, watch if chain protecting it is broken (i.e. it is beneficial to watch chains for all decode targets, not just for the current one) If you are interested in two-way jump (switch to another decode target for a short while, then return back), then it should be possible for two decode targets protected by the same chain, and shouldn't be feasible for two decode targets protected by two different chains.

murillo128 commented 3 years ago

No, my question is even more basic, how does the SFU choose which DT to forward?

Currently, we all do it based on the spatial id/temporal id, or by resolution. But with the DD, there is no way the app can choose which is the appropriate DT to forward.

DanilChapovalov commented 3 years ago

I think logic to choose DT to forward should be application specific. (spatial id, temporal id) pair is a way to identify what to forward. DD replaces that pair with a plain index.

Also, using approach you've described above, it is still possible to map DT to (sid, tid) pair and thus associate a resolution with it. I would expect structures where two decode targets map to the same (sid, tid) would be rare and application specific, so in practice it should be possible to reuse code that identifies what to forward by (sid, tid) pair.

murillo128 commented 3 years ago

I think logic to choose DT to forward should be application specific.

Logic is app specific, but it should be able for the application to choose it.

(spatial id, temporal id) pair is a way to identify what to forward. DD replaces that pair with a plain index.

Exactly, and the application does not know the meaning of each index, so they are useless for the app.

Also, using approach you've described above, it is still possible to map DT to (sid, tid) pair and thus associate a resolution with it.

How? I have not able to find a way to do it yet

I would expect structures where two decode targets map to the same (sid, tid) would be rare and application specific, so in practice it should be possible to reuse code that identifies what to forward by (sid, tid) pair.

Having to signal the decode target meanings offline just make the whole DD idea to be useless.

murillo128 commented 3 years ago

split into 2 issues for easier tracking