Create simpler facades for decoding steps

dholroyd commented 4 years ago

Many pieces of functionality currently exposed by this create for decoding layers of the H264 spec exposes an API which...

accepts partial input
parses the data incrementally
potentially produces partial / intermediate results as data arrives

...all with the goal of allow the implementation to avoid copying data.

This is useful for performance reasons in some workloads, but the API is really complicated and a bit hard to use.

It would be nice to provide alternative APIs that work much more simply when,

the caller already has the complete input data available in a contiguous buffer
maybe they prefer a simple life, even at the expense of decoding being a tiny bit more costly (because the API can only be implemented by coping the source data, at least for some inputs)

These simpler APIs would exist in addition to the performant-but-inconvenient interfaces. The implementations should share as much code as possible (very likely, the 'simple' API could be implemented as a higher-level abstraction on top of the complicated one).

dholroyd commented 4 years ago

Useful feedback on this topic from @scottlamb in https://github.com/dholroyd/h264-reader/issues/3#issuecomment-600997848

scottlamb commented 3 years ago

Continuing from the comment you mentioned: I'm working on my RTSP crate (now in its own repository and MIT/Apache-licensed). I've realized that on the wire, almost all the VCL NALU are fragmented. Right now I accumulate the fragments via copying into a bytes::BytesMut, but I eventually plan to use multi-chunk bytes::Bufs instead that reference the original bytes::Bytes via reference-counting.

I'd also like to eventually decode the slice headers (to track reference frames). I think my ideal is something like this:

Write a h264_reader::rbsp::RbspBufDecoder which takes a &mut dyn Buf and provides a std::io::Read implementation that lazily processes the Buf, skipping the emulation prevention three bytes. It could be behind a bytes feature to avoid adding a required dependency on bytes.
Have h264_reader::rbsp::RbspBitReader take a std::io::Read instead of a slice. I think would require a switch from bitreader to bitstream-io, but anyway bitstream-io is (a) more popular (b) used in eg your adts-reader crate already, and (c) faster, according to the benchmarks here.

This would allow parsing the slice headers without allocation or copying for RBSP processing. Most of the NAL wouldn't be even examined because presumably the slice headers are a very short prefix of it.

I might eventually send you a PR for this if you like this plan. I've got a bunch of other code to write before I get too serious about processing the slice headers though.

scottlamb commented 3 years ago

Actually, I have a much more radical idea. The only emulation prevention three byte-removing logic can be a std::io::Read wrapper around std::io::BufRead, to be used by push parser case and my case (giving it a full NAL, TBD whether that's a single &[u8] or a non-contiguous bytes::Buf). It can keep the RbspDecoder name.

The push parser case would give it a std::io::BufRead implementation that accumulates NAL bytes on push. It'd return ErrorKind::WouldBlock on hitting the end of the buffer if end hasn't been called. RbspDecoder would pass that through, as would RbspBitDecoder (both in read_* and in has_more_rbsp_data), and any slice parsers that want to be capable of trying to parse partial data.

For most NAL types it'd make sense to only try parsing once in end. For slice NALs, where SliceHeader parsing can succeed without most of the bytes, it'd probably make sense to instead try on each push call and stop buffering once parsing returns non-incomplete (success or failure). It'd try from scratch each time.

I think this would simplify the logic for the push parsing as well as sharing more of it with the already-buffered parsing. There'd be just one place push parsing buffers instead of three and counting (PicParameterSetNalHandler, SeqParameterSetNalHandler, and SeiBuffer). And it'd allow finishing the SliceLayerWithoutPartitioningRbsp implementation which I suspect is unfinished in part because ParseState::Continue(header) is awkward to deal with.

Also, I imagine some of the time the caller wants to both parse and keep the NAL bytes to send onward (similar to my RTSP library). This might allow them to share the buffer (at the cost of some more API complexity).

A code sketch:

pub trait RbspHandler {
    type Read: RbspBitRead;
    type Ctx;

    /// Processes a partially buffered NAL unit. `read` always reads from the beginning.
    /// Returns false if no additional calls are desired for this NAL unit.
    fn partial(&mut self, ctx: &mut Context<Self::Ctx>, header: NalHeader, read: Self::Read) -> bool {
        true
    }

    /// Processes a fully buffered NAL unit. `read` always reads from the beginning.
    /// Called iff every prior `partial` call for this NAL unit (zero or more) returned true.
    fn complete(&mut self, ctx: &mut Context<Self::Ctx>, header: NalHeader, read: Self::Read) {}
}

impl<Read: RbspBitRead, Ctx> RbspHandler for PicParameterSetNalHandler<Ctx> {
    type Read = Read;
    type Ctx = Ctx;

    fn complete(&mut self, ctx: &mut Context<Self::Ctx>, header: NalHeader, read: Self::Read) {
        match PicParameterSet::read(ctx, read) {
            Ok(pps) => ctx.put_pic_param_set(pps),
            Err(e) => error!("pps: {:?}", e),
        }
    }
}

impl<Read: RbspBitRead, Ctx> RbspHandler for SliceNalHandler<Ctx> {
    type Read = Read;
    type Ctx = Ctx;

    fn partial(&mut self, ctx: &mut Context<Self::Ctx>, header: NalHeader, read: Self::Read) -> bool {
        match SliceHeader::read(ctx, read) {
            Ok(header) => {
                info!("TODO: expose to caller: {:#?}", header);
                false
            },
            Err(SliceHeaderError::Incomplete) => true,
            Err(e) => {
                error!("slice_header() error: SliceHeaderError::{:?}", e);
                false
            },
        }
    }

    fn complete(&mut self, ctx: &mut Context<Self::Ctx>, header: NalHeader, read: Self::Read) -> bool {
        match SliceHeader::read(ctx, read) {
            Ok(pps) => info!("TODO: expose to caller: {:#?}", header),
            Err(e) => error!("slice_header() error: SliceHeaderError::{:?}", e),
        }
    }
}

What do you think?

scottlamb commented 3 years ago

Also, I imagine some of the time the caller wants to both parse and keep the NAL bytes to send onward (similar to my RTSP library). This might allow them to share the buffer (at the cost of some more API complexity).

On second thought, I don't think it's any more complex. Also, I think we could get rid of NAL/SEI switches, RefCell, user_context, and the need to add plumbing for error reporting. Altogether I find the API sketch below a lot easier to understand than the status quo.

NAL handlers could be a lambda rather than a trait; a little lighter-weight but less explicit. I could go either way on that part.

mod nal {
    /// A single, partially- or completely-buffered NAL.
    /// The push module uses a [ByteSliceNal], but it's possible to implement
    /// this interface with any type that can produce a `std::io::BufRead`.
    pub trait Nal {
        type BufRead: std::io::BufRead;

        /// Returns if the NAL is completely buffered.
        pub fn complete(&self) -> bool;

        /// Returns the NAL header or error if missing/corrupt.
        pub fn header(&self) -> Result<NalHeader, NalHeaderError> { ... }

        /// Returns the bytes in NAL form (including the header byte and
        /// any emulation-prevention-three-bytes). If the NAL is incomplete,
    /// reads may fail with `ErrorKind::WouldBlock`.
        pub fn reader(&self) -> Self::BufRead;

        /// Returns the bytes in RBSP form (skipping header byte and
        /// emulation-prevention-three-bytes).
        pub fn rbsp_bytes(&self) -> rbsp::ByteReader<Self::BufRead> { ... }

        /// Returns a helper to access bits within the RBSP.
        pub fn rbsp_bits(&self) -> rbsp::BitReader<rbsp::ByteReader<Self::BufRead>> { ... }
    }

    /// A NAL which stores its bytes contiguously.
    pub struct ByteSliceNal<'a> { buf: &'a [u8], complete: bool }
    impl<'a> Nal for ByteSliceNal<'a> { ... }
    impl<'a> AsRef<u8> for ByteSliceNal<'a> { ... }
}

mod push {
    /// The results of a NAL handler during push parsing.
    pub enum HandlerResult {
        /// If this NAL is incomplete, continue buffering it for a following call.
        ReadMore,

        /// Stop buffering this NAL; move on to the next one immediately.
        /// This is no different than `ReadMore` when the NAL is complete.
        Skip,

        /// Abort the push call with the given error. Following pushes will also fail.
        AbortPush(Box<dyn Error>),
    }

    /// Accumulator for NAL push parsers.
    /// This is meant to be used by parsers for a specific format: Annex B, AVC, MPEG-TS, RTP, etc.
    /// Accumulates NALs in an internal buffer and delegates to a caller-supplied handler.
    pub struct Accumulator<'a> {
        buf: Vec<u8>,
        state: AccumulatorState,
        handler: &'a mut FnMut(ByteSliceNal),
    }

    impl Accumulator {
        pub fn new(nal_handler: &mut FnMut(ByteSliceNal) -> HandlerResult) -> Self;

        pub fn start(&mut self);
        // TODO: maybe should be push(..., end: bool) instead to avoid an extra !complete call to nal_handler.
        pub fn push(&mut self, buf: &[u8]) -> Result<(), Error>;
        pub fn end(&mut self);
    }

    /// Push parser for an Annex B form bytestream
    /// (one with NALs separated by `00 00 01` or `00 00 00 01`).
    pub struct AnnexBParser<'a> { accumulator: Accumulator<'a>, state: AnnexBParseState }

    impl<'a> AnnexBParser<'a> {
        pub fn new(nal_handler: &mut FnMut(nal: ByteSliceNal) -> HandlerResult) -> Self;
        pub fn push(&mut self, buf: &[u8]) -> Result<(), Error>;
        pub fn end_units(&mut self) -> Result<(), Error>;
    }
}

/// RBSP stuff, independent of push or pull parsing.
mod rbsp {
    /// Strips header byte and `emulation-prevention-three-byte` elements from the single-NAL input
    /// to yield the RBSP.
    pub struct ByteReader<N: std::io::BufRead> { ... }
    impl<N: std::io::BufRead> std::io::Read for ByteReader<N> { ... }
    impl<N: std::io::BufRead> std::io::BufRead for ByteReader<N> { ... }

    /// Reads the RBSP as bits given a byte-level reader.
    // (This only needs `BufRead` rather than `Read` for `has_more_rbsp_data`.)
    pub struct BitReader<R: std::io::BufRead>(bitstream_io::Reader<R, bitstream_io::BigEndian>);
    impl<R: std::io::BufRead> BitReader<R> {
        ...
    }
}

// Example usage
fn my_annexb_reader() {
    let mut ctx = Context::new();
    let mut parser = h264_reader::push::AnnexBParser::new(|nal| {
        match nal.header().unit_type() {
            Ok(UnitType::PicParameterSet) if !nal.complete() => return NalResult::ReadMore,
            Ok(UnitType::PicParameterSet) => {
                match PicParameterSet::read(ctx, nal.rbsp_bits()) {
                    Ok(pps) => ctx.put_pic_param_set(pps),
                    Err(e) => error!("Bad pps: {:?}", e),
                }
            },
            Ok(UnitType::SliceWithPartitioningIdr) => {
                match SliceHeader::read(ctx, nal.rbsp_bits()) {
                    Ok(hdr) => info!("Slice header: {:#?}", hdr),
                    Err(SliceHeaderError::NeedMoreData) => return NalResult::ReadMore,
                    Err(e) => error!("Bad slice header: {:?}", e),
                }
            },
            Ok(_) => {},
            Err(e) => error!("bad NAL header: {:#?}", e),
        }
        NalResult::Skip
    });
    let f = File::open("foo.h264").unwrap();
    let mut reader = BufReader::new(f);
    loop {
        let buf = reader.fill_buf().unwrap();
        if buf.is_empty() {
            break;
        }
        parser.push(buf);
        read.consume(buf.len());
    }
}

dholroyd commented 3 years ago

Does the underlying use of Read / BufRead mean that the implementation must end up copying all bytes out of the source buffers and into some destination buffer? It seems to me that it would, but I might be missing a trick! :)

My hope has always been to be able to extract metadata (sps + pps now; eventually slice_header() too) while avoiding copying the bytes of slice_data( ).

All this said, if there is an irreconcilable choice between either API usability or avoiding slice_data() copies, I think API usability needs to win.

dholroyd commented 3 years ago

@scottlamb I've added you to the project.

Please continue to PR changes; however if I don't respond for a few days and you are blocked then please know that I trust your sense of taste and am ok with you merging in order to move forward.

Don't feel that this places an obligation on you to pick up maintenance work if you don't have the time or inclination - I'm just aiming to make life easy.

For my part I will try to get into the habit of PR'ing my own changes rather than committing direct to master.

scottlamb commented 3 years ago

Does the underlying use of Read / BufRead mean that the implementation must end up copying all bytes out of the source buffers and into some destination buffer? It seems to me that it would, but I might be missing a trick! :)

After some tweaking of the details (now in draft PR #26): it copies on push only after the handler requests to see an incomplete NAL again. So if the whole NAL is in one push, there's no copying. If the SliceHeader can be parsed from the first push alone, there's no copying. If the SliceHeader can be parsed on the second push, it copies the first but not the second.

I think this is strictly an improvement over what PicParameterSetNalHandler, SeqParameterSetNalHandler, and SeiBuffer were doing and means most of a slice's bytes aren't copied even when examining the headers.

@scottlamb I've added you to the project.

Thanks for the vote of confidence!

scottlamb commented 3 years ago

For my part I will try to get into the habit of PR'ing my own changes rather than committing direct to master.

Are you looking for me to review/merge your PRs as you'd do for mine? Or is that just to kick the CI into running before you commit?

scottlamb commented 3 years ago

PR #26 is now ready to review with a significant API overhaul.

scottlamb commented 3 years ago

@dholroyd Could I talk you into taking a look at the PR? This one is much more invasive than previous changes so I've held off on merging it to get a sense how comfortable you are with the new interface.

dholroyd / h264-reader

Create simpler facades for decoding steps #4