chromeos / cros-codecs

BSD 3-Clause "New" or "Revised" License
30 stars 11 forks source link

We should not update the parser state when peeking a SPS #33

Closed dwlsalmeida closed 1 year ago

dwlsalmeida commented 1 year ago

I found out this corner case when developing h.265. Apparently, this is not picked up by any of the tests in the h264 suites we are using, but nevertheless it is still a possibility in the wild.

This line is wrong, because it updates the parser state with a new SPS before processing any pending slices. This can break the decoding process if we send the decoder such a sequence of NALUs:

<old slice> <old slice> <new SPS> <new slice>

The current code will peek the SPS and update the parser state before processing <old slice>s. In the likely event that the SPS simply overwrites a previous SPS held in the parser (by using the same sps_id, e.g.: 0), then the old slices may wrongly refer to a new SPS.

It can be a difficult bug to track down as well.

Note that we are not in control of the sequence of NALUs we are given, as our frame iterators are only used by ccdec, while the client code is in charge of doing that process when using cros-codecs as a library. For our virtualization use-case, it is expected that the guest userspace will have a sane implementation before submitting the data to the virtio video driver.

In order to fix this, we should introduce a new parser function, peek_sps(), which either doesn't take self or takes an immutable reference. This new function will parse a SPS without saving it in the parser. The current function, parse_sps() can be redefined in terms of peek_sps() + an internal save_sps() parser function.

This design fixed a crash in one of the h265 tests.

Gnurou commented 1 year ago

Ah indeed this doesn't look right. Is there anything that prevents us from processing all the NALUs sequentially instead of looking for a SPS first?

Or would it help if we made decode return the number of bytes processed in the input, so the caller knows they need to call it again in case it returned without processing all the stream (which would happen if we e.g. have a SPS in the middle like your example)?

dwlsalmeida commented 1 year ago

Ah indeed this doesn't look right. Is there anything that prevents us from processing all the NALUs sequentially instead of looking for a SPS first?

I don't think so, in fact this is what Chromium and GStreamer do. That is, they parse the NALUs in order, and if they see a SPS, they will renegotiate.

If, for whatever reason, someone sends <slice>, <sps>, <pps> this will return an error though.

I think it is more sane if we just let it error out though, as I do not think it is sane for any stream to contain slice data before its respective SPS and PPS are parsed anyways, in which case we can get rid of peek_sps altogether.

Or would it help if we made decode return the number of bytes processed in the input, so the caller knows they need to call it again in case it returned without processing all the stream (which would happen if we e.g. have a SPS in the middle like your example)?

Even if we remove peek_sps() altogether, we still need to implement the above somehow, see #35.

My original idea was to return the current offset in DecodingState::AwaitingFormat, but returning it from decode() does make more sense indeed.

dwlsalmeida commented 1 year ago

This is fixed by #37

We now process each NAL by itself and if a SPS is not available when one is needed we reject the stream.