decode() should process a single unit or be able to partially process input

Gnurou commented 1 year ago

The current implementation of decode expects that the input buffer will contain one or more frames worth of data. This introduces constraints to clients, who cannot e.g. send a single slice of a multi-slice H.264 frame, and to the decoder which should ideally check that there are enough resources for processing the whole input, but currently cannot guarantee that if there are multiple frames.

We could change the signature of decode to something like this:

fn decode(&mut self, timestamp: u64, mut bitstream: &[u8]) -> Result<usize, DecodeError> {

In this new implementation, decode only processes a single unit of data. For H.26x this would be a single NAL unit, for VP8/VP9 a single frame.

The return value is the number of bytes processed in bitstream. If it is less than bitstream.len(), then the caller should call decode again on &bitstream[return_value..] until the whole bistream is processed.

This has the advantage of removing a loop in every implementation of decode, and will allow decode to detect when it doesn't have enough free output buffers to perform the operation and bail out in that case.

On the other hand, this means that H.264 (and probably H.265) will have to keep the current picture in its state again, and also that it won't be able to detect its end until the first buffer of the next picture is queued or flush is called.

Gnurou commented 1 year ago

@dwlsalmeida WDYT? Am I missing something important here or do you think we can switch to this model for stateless decoders?

Gnurou commented 1 year ago

Solving this would also probably solve #34 and help with #33.

dwlsalmeida commented 1 year ago

Hi @Gnurou

As we spoke during the call, yes this new design is better than what we have currently.

A few things on my mind:

a) why is the bitstream now mut in the signature?

b) flush() will need to attempt to submit work and block on its completion that before doing anything else. For H.264, this means calling a variation of i.e.:

                    let slice = self.parser.parse_slice_header(nalu)?;
                    let mut cur_pic = match cur_pic_opt {
                        // No current picture, start a new one.
                        None => self.begin_picture(timestamp, &slice)?,
                        // We have a current picture but are starting a new field: finish it and
                        // start a new one.
                        Some(cur_pic)
                            if self.dpb.interlaced()
                                && matches!(cur_pic.0.field, Field::Frame)
                                && !cur_pic.0.is_second_field()
                                && cur_pic.0.field != slice.header().field() =>
                        {
                            self.finish_picture(cur_pic)?;
                            self.begin_picture(timestamp, &slice)?
                        }
                        // This slice is part of the current picture.
                        Some(cur_pic) => cur_pic,
                    };

                    self.handle_slice(&mut cur_pic, &slice)?;
                    cur_pic_opt = Some(cur_pic);
                }

                    <snip>

                    ...

                            if let Some(cur_pic) = cur_pic_opt.take() {
                               self.finish_picture(cur_pic)?;
                            }

Since this is now called from two different paths, IMHO we should extract it into its own function. Maybe fn decode_slice_nalu? This function would then call begin_picture(), handle_slice() and finish_picture() as needed.

c) we should rename decode_access_unit, as this will not be an access unit anymore. Maybe fn decode_nalu?

On the other hand, this means that H.264 (and probably H.265) will have to keep the current picture in its state again

I don't think this is a problem since we do not unwrap it anymore.

and also that it won't be able to detect its end until the first buffer of the next picture is queued or flush is called.

I don't think this is a problem. In non-blocking mode the state tracker will usually be much ahead of the accelerator anyways (that's the whole point of non-blocking mode, to let the parser and the decoder work ahead while the GPU is busy actually decoding the slice data)

dwlsalmeida commented 1 year ago

See #38 and #39

dwlsalmeida commented 1 year ago

simple_playback_loop was also refactored by 6cf1c147

I think we can close this

chromeos / cros-codecs

decode() should process a single unit or be able to partially process input #37