LukasBommes / mv-extractor

Extract frames and motion vectors from H.264 and MPEG-4 encoded video.
MIT License
279 stars 56 forks source link

Combine I/B/P frames and motion vectors back to video? #10

Closed joeyballentine closed 2 years ago

joeyballentine commented 2 years ago

Hello. I'm thinking about trying to use this in a machine learning project, and I'm wondering if I'd be able to extract all this information, do some processing on it, and then combine it back into video? As far as I know there's currently no way to do this

LukasBommes commented 2 years ago

Hey JoeyBallentine,

what information exactly do you want to extract? The library provides the decoded frame, the motion vectors and the frame type (I/P/B) for each frame in the stream.

Combining the I/P/B frames and motion vectors to full video frames is exactly what the decoder does when playing the stream. For details on this, I can recommend the book "The H.264 Advanced Video Compression Standard" by Iain E. Richardson (ISBN: 978-0-470-51692-8).

joeyballentine commented 2 years ago

My goal was to extract these things, feed them into a custom machine learning model, and have it spit out an upscaled version of everything that can be put back into video form.

Normally, machine learning video networks work on extracted frames as either individual images or sequenced of images. My goal was to work with the raw components of the video instead, but maybe that's just not feasible.

LukasBommes commented 2 years ago

I see. You can certainly train a neural network or other ML algorithm on the motion vectors. In fact, I have tried that out in the past for the task of object tracking. I didn't get the model to train properly. But there are works on exactly this. For example, this object tracker: https://ieeexplore.ieee.org/document/8734056/ There are other works, which use extracted motion vectors for action recognition and object detection.

LukasBommes commented 2 years ago

What I haven't seen so far but what would be very useful is an algorithm, which can track instance segmentation masks in a video based on motion vectors. The masks could either be rigidly tracked or deformed to fit the object.

LukasBommes commented 2 years ago

Closing due to inactivity. Feel free to reopen for further discussion.

ChenyangQiQi commented 6 months ago

@LukasBommes Thanks for great work. I also want to know if there is an algorithm that can take "I frames, motion vector " as input, then reconstruction the original video with dimention. [time, height, width, channel]