chromeos / cros-codecs

BSD 3-Clause "New" or "Revised" License
30 stars 11 forks source link

Flushing is fishy #30

Closed Gnurou closed 1 year ago

Gnurou commented 1 year ago

When flushing, we current clear all the reference frames, e.g. for VP9:

    fn flush(&mut self) {
        // Note: all the submitted frames are already in the ready queue.
        self.reference_frames = Default::default();
        self.decoding_state = DecodingState::AwaitingStreamInfo;
    }

However, the parser still contains some state related to the reference frames, and other preserved state should also probably be invalid since we are in AwaitingStreamInfo again. Going to this state also means that we will emit a DRC event even if the resolution has not actually changed.

We should probably do either more or less, but the current amount of state clearing does not seem very consistent.

dwlsalmeida commented 1 year ago

I just checked the GStreamer implementation again. I am mentioning GStreamer as an example because people coming from other stacks may expect the same semantics when referring to common terms such as flush.

However, the parser still contains some state related to the reference frames, and other preserved state should also probably be invalid since we are in AwaitingStreamInfo again.

I do not think this is a problem, because we will only act upon that state once a keyframe is processed, and key frames reset the decoding state anyway, but nevertheless we can improve.

Going to this state also means that we will emit a DRC event even if the resolution has not actually changed.

This should be fixed, yes. We already keep track of the current resolution and we should transition into DecodingState::Decoding directly if we do not detect a change in the stream parameters.

One of the things I have been meaning to do is to hold the negotiation parameters into a single struct, as I've done with h265 . This makes the negotiation_needed code more straightforward:

Maybe we can retrofit this for the other codecs as:

/// Decoder implementations can use this struct to represent their decoding state.
#[derive(Default)]
enum DecodingState<T> {
    /// Decoder will ignore all input until format and resolution information passes by.
    #[default]
    // We may have seem a bit of the stream already.
    AwaitingStreamInfo(Option<NegotiationInfo>),
    /// Decoder is stopped until the client has confirmed the output format.
    AwaitingFormat(T),
    /// Decoder is currently decoding input.
    Decoding {
       // The actual decoding state. E.g. for VP9 that would mean: 

        /// The reference frames in use.
        reference_frames: [Option<T>; NUM_REF_FRAMES],
        /// Per-segment data.
        segmentation: [Segmentation; MAX_SEGMENTS],
        /// Keeps track of the last values seen for negotiation purposes.
        negotiation_info: NegotiationInfo

       // Note that we can transition here directly if we came from
       // AwaitingStreamInfo(Some(NegotiationInfo)) and no changes were detected.
    },
}
Gnurou commented 1 year ago

I do not think this is a problem, because we will only act upon that state once a keyframe is processed, and key frames reset the decoding state anyway, but nevertheless we can improve.

You are correct indeed. I guess your new proposal for DecodingState would make things safer though - and maybe even bring us the potential to factorize more of the decoder code if we can make the content of Decoding generic as well!

So the problem we currently have is that since we keep the pre-reset resolution, negotiation_possible never returns true unless the resolution has changed post-reset and we stay forever in the AwaitingStreamInfo state, ignoring all input. We can probably fix that by making NegotiationInfo a generic parameter of DecodingState and letting each decoder put what they need there.

Gnurou commented 1 year ago

Fixed this by adding a dedicated Reset state. This is basically equivalent to adding a parameter to AwaitingStreamInfo, but is also easier to match.

I still think we should try to move stuff into Decoding to improve this further and maybe share more of the decoder code.

With this change seeking works flawlessly in crosvm! \o/