How to read H.264 data from V4L2 and write MP4 file

midnightoil commented 5 years ago

I'd like to use Bento4 together with Video For Linux 2 to read raw H.264 frames from a hardware encoder and write them to an MP4 container file.

The HW interface and V4L2 part works fine to write a raw YUV file. But the Bento4 docs are not clear on how to do this. Using _AP4MemoryByteStream seems the right direction.

The class AP4_File represents all the information about an MP4 file. Internally, a tree of AP4_Atom objects plus other helper objects holds the actual information.

To create an instance of the class, the caller must pass a reference to an AP4_ByteStream object that represents the file data storage. The SDK includes two subclasses of the abstract AP4_ByteStream class: AP4_FileByteStream for reading/writing disk-based files and AP4_MemoryByteStream for working with in- memory file images.

The sample file translation from H.264 to .MP4 Mp4Mux.cpp is the right idea but it is not evident on how to rework it to take data little by little...

The HW encoder outputs H.264 frames via V4L2 like this:

while (running)
{
    // Dequeue a video frame
    ioctl(mDevFd, VIDIOC_DQBUF, &v4l2Buffer);
    writeBufferToMp4OutputFile(v4l2Buffer);
}
closeOutputFile();

How would I replace writeBufferToDisk to instead write to AP4_MemoryByteStream and then write out the .MP4 output file when done?

barbibulle commented 5 years ago

mp4mux.cpp is indeed the right place to look for an example that's pretty close to what you want to do. You'll need to do something similar to what's done in AddH264Track, but instead of reading the sample data from a file, you'd read them from your source of h.264 data. I'm assuming here that you want to do this "on the fly" (i.e not write the entire file to disk and then package it: if you wanted to do that, you could simply write the raw h.264 stream to a file, but just saving your buffer to disk instead of calling writeBufferToMp4OutputFile). To do this "on the fly", you can take the buffer form ioctl(mDevFd, VIDIOC_DQBUF, &v4l2Buffer); and then use that to feed the H.264 parser (look at the call to parser.Feed() in AddH264Track. The difficult part here is that the MP4 file created this way requires that all the samples be known before you can start writing to disk, which means that in your case they'll all be loaded into memory. So if you want to avoid that, I would recommend that you do write your buffers to disk first, and then just reference the data on disk (that's what the SampleFileStorage class does in mp4mux.cpp

midnightoil commented 5 years ago

Gilles, thanks for the response. I'll look into it. Indeed I want to do it on the fly within the process. Otherwise if I write to file and use an external process I could use a ffmpeg command line via system().

Its an embedded system with limited disk. How about skipping the first N seconds of 264 frames to get the samples and then start feeding the parser?

The input is from a hardware encoder I control so it is always known. One video stream of .264 NALs and one stream of PCM stereo audio 16-bit samples.

barbibulle commented 5 years ago

If you want to write the output MP4 file on the fly, as the frames are encoded, that's totally possible, but you'll need to create a slightly different flavor or MP4 file than the one produced by mp4mux. You would need to create what's called a 'fragmented' MP4 file. It is a file that starts like a normal MP4 file, with a 'moov' box that contains the metadata for the file, including track descriptors, etc., followed by fragments of media. Each media fragment is like a small self-contained group of media samples (typically a few seconds). For an example of that, the application mp4fragment shows you how to convert a standard MP4 file into a fragmented MP4 file. That app is more complex than what you need (because it needs to detect and align audio/video samples), but it does show you an example of how the API can be used to create a fragmented MP4 file. One thing to keep in mind is that for maximum compatibility, you should know, when you write the start of the file, the 'sample description' for your media stream (resolution, SPS, PPS, etc.). It is possible to put all the SPS and PPS only inline with the media in the fragments, but I would recommend also having it in the moov sample descriptions if you can (I'm assuming that those don't change over time). In that scenario, you'd construct the moov first, write it to disk, then write fragments to disk, one fragment at a time (assuming you can buffer 2+ seconds worth of video frames in memory for each fragment). The actual H.264/H.265 parser doesn't need any minimum buffer size to work, you can feed it data by any amount (even byte by byte if you want), it is fully incremental. So feel free to feed it data as soon as it comes from your encoder. As soon as the parser 'finds' a frame, that frame is returned to you, and you can keep going as long as you want (i.e keep the same parser for your entire session, no need to reset anything between frames or fragments).

axiomatic-systems / Bento4

How to read H.264 data from V4L2 and write MP4 file #339