Closed joeyballentine closed 2 years ago
Hey JoeyBallentine,
what information exactly do you want to extract? The library provides the decoded frame, the motion vectors and the frame type (I/P/B) for each frame in the stream.
Combining the I/P/B frames and motion vectors to full video frames is exactly what the decoder does when playing the stream. For details on this, I can recommend the book "The H.264 Advanced Video Compression Standard" by Iain E. Richardson (ISBN: 978-0-470-51692-8).
My goal was to extract these things, feed them into a custom machine learning model, and have it spit out an upscaled version of everything that can be put back into video form.
Normally, machine learning video networks work on extracted frames as either individual images or sequenced of images. My goal was to work with the raw components of the video instead, but maybe that's just not feasible.
I see. You can certainly train a neural network or other ML algorithm on the motion vectors. In fact, I have tried that out in the past for the task of object tracking. I didn't get the model to train properly. But there are works on exactly this. For example, this object tracker: https://ieeexplore.ieee.org/document/8734056/ There are other works, which use extracted motion vectors for action recognition and object detection.
What I haven't seen so far but what would be very useful is an algorithm, which can track instance segmentation masks in a video based on motion vectors. The masks could either be rigidly tracked or deformed to fit the object.
Closing due to inactivity. Feel free to reopen for further discussion.
@LukasBommes Thanks for great work. I also want to know if there is an algorithm that can take "I frames, motion vector " as input, then reconstruction the original video with dimention. [time, height, width, channel]
Hello. I'm thinking about trying to use this in a machine learning project, and I'm wondering if I'd be able to extract all this information, do some processing on it, and then combine it back into video? As far as I know there's currently no way to do this