PeterL1n / RobustVideoMatting

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!
https://peterl1n.github.io/RobustVideoMatting/
GNU General Public License v3.0
8.53k stars 1.13k forks source link

Synchronization issues between inferred mask and original video #131

Closed italosalgado14 closed 2 years ago

italosalgado14 commented 2 years ago

Hello!. Thanks for the code. I have had some timing issues between the inferred output in the mask compared to the original video. I made this comparison by transforming my original video and the output video from masks to frames. I have obtained the same amount of frames in both processes, so the difference can be caused by a bad configuration of mine. My original video is 30fps and 1080x1920. If you have a suggestion I would appreciate it.

italosalgado14 commented 2 years ago

Errata: I have used the following code to infer.

import torch import sys

print("RVM funcionara en: " + sys.argv[1])

model = torch.hub.load("PeterL1n/RobustVideoMatting", "mobilenetv3").cuda() # or "resnet50" convert_video = torch.hub.load("PeterL1n/RobustVideoMatting", "converter")

convert_video( model, # The loaded model, can be on any device (cpu or cuda). input_source=sys.argv[1], # A video file or an image sequence directory. downsample_ratio=None, # [Optional] If None, make downsampled max size be 512px. output_type='video', # Choose "video" or "png_sequence" output_composition='com.mp4', # File path if video; directory path if png sequence. output_alpha="pha.mp4", # [Optional] Output the raw alpha prediction. output_foreground="fgr.mp4", # [Optional] Output the raw foreground prediction. output_video_mbps=4, # Output video mbps. Not needed for png sequence. seq_chunk=12, # Process n frames at once for better parallelism. num_workers=1, # Only for image sequence input. Reader threads. progress=True # Print conversion progress.

)

PeterL1n commented 2 years ago

They are not aligned due to a bug in pyav. The output video may have different fps than the original. For example if your original video is 29.97fps, the output has to be rounded to 30fps. They are not aligned if you playback directly. But if they have the same amount of frames, you should be applying the mask frame by frame.

italosalgado14 commented 2 years ago

example_of_error Thanks for the reply! Indeed my video is almost at 30 fps (29.4), which can cause the error you mention. But I have the same amount of frames both in the mask as in my original video (taking the videos to frames with python). I will preprocess the video to leave it at 30fps and I will comment here on how the solution was if it works for someone else.

italosalgado14 commented 2 years ago

Every its OK! Run a ffmpeg command over the original video to set the framerate to 30 fps (not decimal) and its works OK. Thanks!