dholroyd / h264-reader

Rust reader for H264 bitsream syntax
Apache License 2.0
73 stars 25 forks source link

It looks like VuiParameter parsing is slightly off #47

Closed Jasper-Bekkers closed 1 year ago

Jasper-Bekkers commented 1 year ago

I'm not sure if I'm doing something wrong or not (not a video expert). However, when trying to extract the h264 stream from the 1MB 1080p mp4 from here and loading it up it seems to give different results from the h264-bitstream-viewer. Specifically in the SPS VUI section.

The h264 bitstream viewer tool gives this output: image

While my Rust test app seems to give this output:

[src\main.rs:18] &sps.len() = 27
[src\main.rs:23] sps = Ok(
    SeqParameterSet {
        profile_idc: ProfileIdc(
            100,
        ),
        constraint_flags: ConstraintFlags {
            flag0: false,
            flag1: false,
            flag2: false,
            flag3: false,
            flag4: false,
            flag5: false,
            reserved_zero_two_bits: 0,
        },
        level_idc: 42,
        seq_parameter_set_id: ParamSetId(
            0,
        ),
        chroma_info: ChromaInfo {
            chroma_format: YUV420,
            bit_depth_luma_minus8: 0,
            bit_depth_chroma_minus8: 0,
            qpprime_y_zero_transform_bypass_flag: false,
            scaling_matrix: SeqScalingMatrix,
        },
        log2_max_frame_num_minus4: 0,
        pic_order_cnt: TypeZero {
            log2_max_pic_order_cnt_lsb_minus4: 3,
        },
        max_num_ref_frames: 4,
        gaps_in_frame_num_value_allowed_flag: false,
        pic_width_in_mbs_minus1: 119,
        pic_height_in_map_units_minus1: 67,
        frame_mbs_flags: Frames,
        direct_8x8_inference_flag: true,
        frame_cropping: Some(
            FrameCropping {
                left_offset: 0,
                right_offset: 0,
                top_offset: 0,
                bottom_offset: 4,
            },
        ),
        vui_parameters: Some(
            VuiParameters {
                aspect_ratio_info: Some(
                    Ratio1_1,
                ),
                overscan_appropriate: Unspecified,
                video_signal_type: None,
                chroma_loc_info: None,
                timing_info: Some(
                    TimingInfo {
                        num_units_in_tick: 768,
                        time_scale: 16777219,
                        fixed_frame_rate_flag: false,
                    },
                ),
                nal_hrd_parameters: None,
                vcl_hrd_parameters: None,
                low_delay_hrd_flag: None,
                pic_struct_present_flag: false,
                bitstream_restrictions: None,
            },
        ),
    },
)

Excuse the large screenshots and data dump, but notie how num_units_in_tick are 768 in the h264-reader case while the are expected to be 1.

I've attached the extracted h264 bitstream as a zip file and attached it to this issue though it could be extracted using Big_Buck_Bunny_1080_10s_1MB.zip

ffmpeg.exe -i C:\Users\Jasper\Downloads\Big_Buck_Bunny_1080_10s_1MB.mp4  -vcodec copy -vbsf h264_mp4toannexb -an the_bitstream.h264
Jasper-Bekkers commented 1 year ago

Turned out to be a problem with my use of the API, instead of parsing the NAL I was just skipping it, leading to some incorrect parsing.

dholroyd commented 1 year ago

Roughly how did you code need to change to make it work? I'm wondering if the API could be improved to make the incorrect usage into a compile error rather than a parse error at runtime.

Jasper-Bekkers commented 1 year ago

I was just skipping one byte of data, rather then parsing. The difference was basically the difference between these benchmarks: https://github.com/dholroyd/h264-reader/blob/e7bb3bfdb449d61a18e9cc78cdbb36755590ab57/benches/bench.rs#L134

scottlamb commented 1 year ago

Yeah, I think that code is unnecessarily confusing (my fault). The comment is true, so it works on that particular hardcoded data, but if folks are going to be looking at it to figure out how to use the API, it should actually call the decode function rather than taking a shortcut. Would https://github.com/dholroyd/h264-reader/pull/48 have helped you out?

Jasper-Bekkers commented 1 year ago

I don't think it would've helped; I arrived at it before finding the bench, however I have been looking at the benchmarks to see how I could best use this api together with the mp4 crate for example.

dholroyd commented 1 year ago

Simple newtype wrappers around the &[u8] / Cow<[u8]> types currently used in various signatures might help make it harder to plug the pieces together incorrectly; at some extra complexity cost.

pub fn decode_nal<'a>(nal_unit: &'a [u8]) -> Result<Cow<'a, [u8]>, std::io::Error> {

:arrow_down:

pub fn decode_nal<'a>(nal_unit: NalData<'a>) -> Result<RbpsData<'a>, std::io::Error> {

(Type names and implementation details not worked out properly here.)

Would this be worthwhile? Is there an even better approach?

Jasper-Bekkers commented 1 year ago

Maybe? In some cases I have I still at least need raw byte access as well so I'm not sure how helpful it would be eventually, though "just making the types work" is pretty nice to be able to do.