Stonks3141 / pet-monitor-app

A simple pet monitor for Linux.
MIT License
12 stars 1 forks source link

mp4-stream/bmff: broken video after encoding #136

Open bespegit opened 9 months ago

bespegit commented 9 months ago

Hi, i try to convert mp4 to fragmented mp4 using crate mp4-stream and bmff. Sample video from: https://storage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4 Convert mp4 to raw h264 bitstream using ffmpeg: ffmpeg -i ForBiggerMeltdowns.mp4 -vcodec copy -an -bsf:v h264_mp4toannexb -f h264 ForBiggerMeltdowns.h264 Code for converting into a fmp4 using mp4-stream and bmff:

#[derive(Debug, Clone)]
struct InitSegment {
    ftyp: FileTypeBox,
    moov: MovieBox,
}

impl WriteTo for InitSegment {
    fn write_to(&self, mut w: impl Write) -> io::Result<()> {
        write_to(&self.ftyp, &mut w)?;
        write_to(&self.moov, &mut w)?;
        Ok(())
    }
}

impl InitSegment {
    fn size(&self) -> u64 {
        self.ftyp.size() + self.moov.size()
    }

    fn new() -> Self {
        let sps = vec![
            0x67, 0x64, 0x00, 0x1f, 0xac, 0xd9, 0x80, 0x50, 0x05, 0xbb, 0x01, 0x6a, 0x02, 0x02,
            0x02, 0x80, 0x00, 0x00, 0x03, 0x00, 0x80, 0x00, 0x00, 0x1e, 0x07, 0x8c, 0x18, 0xcd,
        ]; // TODO
        let pps = vec![0x68, 0xe9, 0x7b, 0x2c, 0x8b]; // TODO
        let (width, height) = (1280, 720);

        let ftyp = FileTypeBox {
            major_brand: *b"isom",
            minor_version: 0,
            compatible_brands: vec![*b"isom", *b"iso6", *b"iso2", *b"avc1", *b"mp41"],
        };

        let time = Utc::now();
        let timescale = 30;
        let duration = Some(Duration::zero());

        let moov = MovieBox {
            mvhd: MovieHeaderBox {
                creation_time: time,
                modification_time: time,
                timescale,
                duration,
                rate: I16F16::from_num(1),
                volume: I8F8::from_num(1),
                matrix: MATRIX_0,
                next_track_id: 0,
            },
            trak: vec![TrackBox {
                tkhd: TrackHeaderBox {
                    flags: TrackHeaderFlags::TRACK_ENABLED
                        | TrackHeaderFlags::TRACK_IN_MOVIE
                        | TrackHeaderFlags::TRACK_IN_PREVIEW,
                    creation_time: time,
                    modification_time: time,
                    track_id: 1,
                    timescale,
                    duration,
                    layer: 0,
                    alternate_group: 0,
                    volume: I8F8::from_num(1),
                    matrix: MATRIX_0,
                    width: U16F16::from_num(width),
                    height: U16F16::from_num(height),
                },
                tref: None,
                edts: None,
                mdia: MediaBox {
                    mdhd: MediaHeaderBox {
                        creation_time: time,
                        modification_time: time,
                        timescale,
                        duration,
                        language: *b"und",
                    },
                    hdlr: HandlerBox {
                        handler_type: HandlerType::Video,
                        name: "foo".to_string(), // TODO
                    },
                    minf: MediaInformationBox {
                        media_header: MediaHeader::Video(VideoMediaHeaderBox {
                            graphics_mode: GraphicsMode::Copy,
                            opcolor: [0, 0, 0],
                        }),
                        dinf: DataInformationBox {
                            dref: DataReferenceBox {
                                data_entries: vec![DataEntry::Url(DataEntryUrlBox {
                                    flags: DataEntryFlags::SELF_CONTAINED,
                                    location: String::new(),
                                })],
                            },
                        },
                        stbl: SampleTableBox {
                            stsd: SampleDescriptionBox {
                                entries: vec![Box::new(AvcSampleEntry {
                                    data_reference_index: 1,
                                    width: width as u16,
                                    height: height as u16,
                                    horiz_resolution: U16F16::from_num(72),
                                    vert_resolution: U16F16::from_num(72),
                                    frame_count: 1,
                                    depth: 0x0018,
                                    avcc: AvcConfigurationBox {
                                        configuration: AvcDecoderConfigurationRecord {
                                            profile_idc: 0x64, // high
                                            constraint_set_flag: 0x00,
                                            level_idc: 0x1f, // 0x2a: 4.2 0b0010_1100
                                            sequence_parameter_set: sps,
                                            picture_parameter_set: pps,
                                        },
                                    },
                                })],
                            },
                            stts: TimeToSampleBox { samples: vec![] },
                            stsc: SampleToChunkBox { entries: vec![] },
                            stsz: SampleSizeBox {
                                sample_size: SampleSize::Different(vec![]),
                            },
                            stco: ChunkOffsetBox {
                                chunk_offsets: vec![],
                            },
                        },
                    },
                },
            }],
            mvex: Some(MovieExtendsBox {
                mehd: None,
                trex: vec![TrackExtendsBox {
                    track_id: 1,
                    default_sample_description_index: 1,
                    default_sample_duration: 0,
                    default_sample_size: 0,
                    default_sample_flags: DefaultSampleFlags::empty(),
                }],
            }),
        };

        Self { ftyp, moov }
    }
}

#[derive(Debug, Clone)]
pub struct MediaSegment {
    moof: MovieFragmentBox,
    mdat: MediaDataBox,
}

impl MediaSegment {
    fn new(sequence_number: u32, sample_sizes: Vec<u32>, data: Vec<u8>) -> Self {
        let timescale = 30;
        let mut moof = MovieFragmentBox {
            mfhd: MovieFragmentHeaderBox { sequence_number },
            traf: vec![TrackFragmentBox {
                tfhd: TrackFragmentHeaderBox {
                    track_id: 1,
                    base_data_offset: Some(0),
                    sample_description_index: None,
                    default_sample_duration: Some(timescale * 1 / 30),
                    default_sample_size: None,
                    default_sample_flags: {
                        #[allow(clippy::unwrap_used)] // infallible
                        Some(DefaultSampleFlags::from_bits(0x0101_0000).unwrap())
                    }, // not I-frame
                    default_base_is_moof: false,
                },
                trun: vec![TrackFragmentRunBox {
                    data_offset: Some(0),
                    first_sample_flags: Some(0x0200_0000), // I-frame
                    sample_durations: None,
                    sample_sizes: Some(sample_sizes),
                    sample_flags: None,
                    sample_composition_time_offsets: None,
                }],
            }],
        };

        moof.traf[0].trun[0].data_offset = Some(moof.size() as i32 + 8);

        Self {
            moof,
            mdat: MediaDataBox {
                headers: None,
                data: Arc::new(data),
            },
        }
    }

    fn size(&self) -> u64 {
        self.moof.size() + self.mdat.size()
    }

    fn base_data_offset(&mut self) -> &mut Option<u64> {
        &mut self.moof.traf[0].tfhd.base_data_offset
    }

    fn sequence_number(&mut self) -> &mut u32 {
        &mut self.moof.mfhd.sequence_number
    }
}

impl WriteTo for MediaSegment {
    fn write_to(&self, mut w: impl Write) -> io::Result<()> {
        write_to(&self.moof, &mut w)?;
        write_to(&self.mdat, &mut w)?;
        Ok(())
    }
}

fn main() {
    let mut mp4 = Vec::new();
    let init_segment = InitSegment::new();
    init_segment.write_to(&mut mp4).unwrap();

    let data = std::fs::read("ForBiggerMeltdowns.h264").unwrap();
    let mut size = 0;
    for (i, nal) in openh264::nal_units(&data).enumerate() {
        size += nal.len();
        let sample_sizes = vec![nal.len() as _];
        let mut media_segment = MediaSegment::new(0, sample_sizes, nal.to_vec());
        *media_segment.base_data_offset() = Some(size as _);
        *media_segment.sequence_number() = (i + 1) as _;
        media_segment.write_to(&mut mp4).unwrap();
    }
    std::fs::write("test.mp4", mp4).unwrap();
}

Test video using ffmpeg: ffmpeg -v error -i test.mp4 -f null Error log:

[h264 @ 0x5564e04ed5c0] Invalid NAL unit size (0 > 23).
[h264 @ 0x5564e04ed5c0] Error splitting the input into NAL units.
[h264 @ 0x5564e04ed5c0] Invalid NAL unit size (0 > 3).
[h264 @ 0x5564e04ed5c0] Error splitting the input into NAL units.
[h264 @ 0x5564e04ed5c0] Invalid NAL unit size (780105384 > 34653).
[h264 @ 0x5564e04ed5c0] Error splitting the input into NAL units.
[h264 @ 0x5564e04ed5c0] Invalid NAL unit size (183668528 > 3803).
[h264 @ 0x5564e04ed5c0] Error splitting the input into NAL units.
[h264 @ 0x5564e04ed5c0] Invalid NAL unit size (1929255025 > 6242).
[h264 @ 0x5564e04ed5c0] Error splitting the input into NAL units.
[h264 @ 0x5564e04ed5c0] Invalid NAL unit size (-1164684585 > 4619).
[h264 @ 0x5564e04ed5c0] Error splitting the input into NAL units.
[h264 @ 0x5564e04ed5c0] Invalid NAL unit size (-750884394 > 4414).
[h264 @ 0x5564e04ed5c0] Error splitting the input into NAL units.
[h264 @ 0x5564e04ed5c0] Invalid NAL unit size (-1820532732 > 3335).
[h264 @ 0x5564e04ed5c0] Error splitting the input into NAL units.
[h264 @ 0x5564e04ed5c0] Invalid NAL unit size (1299338831 > 3776).
[h264 @ 0x5564e04ed5c0] Error splitting the input into NAL units.
[h264 @ 0x5564e04ed5c0] Invalid NAL unit size (-1016882253 > 2624).

Output video cant play in firefox, chrome.

Stonks3141 commented 9 months ago

Hi! I'm sorry, I never really learned how H.264 works and I've barely touched this code in the last year, so I can't really help you with your problem. I do know that generally, fragmented MP4 media segments will consist of many NAL units over several seconds of video and your code appears to be creating one media segment per NAL unit. Would you mind explaining why you want to use the mp4-stream/bmff crates?