Closed Gnurou closed 1 year ago
@dwlsalmeida WDYT? Am I missing something important here or do you think we can switch to this model for stateless decoders?
Solving this would also probably solve #34 and help with #33.
Hi @Gnurou
As we spoke during the call, yes this new design is better than what we have currently.
A few things on my mind:
a) why is the bitstream now mut
in the signature?
b) flush()
will need to attempt to submit work and block on its completion that before doing anything else. For H.264, this means calling a variation of i.e.:
let slice = self.parser.parse_slice_header(nalu)?;
let mut cur_pic = match cur_pic_opt {
// No current picture, start a new one.
None => self.begin_picture(timestamp, &slice)?,
// We have a current picture but are starting a new field: finish it and
// start a new one.
Some(cur_pic)
if self.dpb.interlaced()
&& matches!(cur_pic.0.field, Field::Frame)
&& !cur_pic.0.is_second_field()
&& cur_pic.0.field != slice.header().field() =>
{
self.finish_picture(cur_pic)?;
self.begin_picture(timestamp, &slice)?
}
// This slice is part of the current picture.
Some(cur_pic) => cur_pic,
};
self.handle_slice(&mut cur_pic, &slice)?;
cur_pic_opt = Some(cur_pic);
}
<snip>
...
if let Some(cur_pic) = cur_pic_opt.take() {
self.finish_picture(cur_pic)?;
}
Since this is now called from two different paths, IMHO we should extract it into its own function. Maybe fn decode_slice_nalu
? This function would then call begin_picture()
, handle_slice()
and finish_picture()
as needed.
c) we should rename decode_access_unit
, as this will not be an access unit anymore. Maybe fn decode_nalu
?
On the other hand, this means that H.264 (and probably H.265) will have to keep the current picture in its state again
I don't think this is a problem since we do not unwrap it anymore.
and also that it won't be able to detect its end until the first buffer of the next picture is queued or flush is called.
I don't think this is a problem. In non-blocking mode the state tracker will usually be much ahead of the accelerator anyways (that's the whole point of non-blocking mode, to let the parser and the decoder work ahead while the GPU is busy actually decoding the slice data)
See #38 and #39
simple_playback_loop
was also refactored by 6cf1c147
I think we can close this
The current implementation of decode expects that the input buffer will contain one or more frames worth of data. This introduces constraints to clients, who cannot e.g. send a single slice of a multi-slice H.264 frame, and to the decoder which should ideally check that there are enough resources for processing the whole input, but currently cannot guarantee that if there are multiple frames.
We could change the signature of decode to something like this:
In this new implementation,
decode
only processes a single unit of data. For H.26x this would be a single NAL unit, for VP8/VP9 a single frame.The return value is the number of bytes processed in
bitstream
. If it is less thanbitstream.len()
, then the caller should calldecode
again on&bitstream[return_value..]
until the whole bistream is processed.This has the advantage of removing a loop in every implementation of
decode
, and will allowdecode
to detect when it doesn't have enough free output buffers to perform the operation and bail out in that case.On the other hand, this means that H.264 (and probably H.265) will have to keep the current picture in its state again, and also that it won't be able to detect its end until the first buffer of the next picture is queued or
flush
is called.