Frame-level processing - Githubissues

ControlNet / LAV-DF

[CVIU] Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

Other

67 stars 8 forks source link

Hi, sorry for the late reply.

I think it might be hard, because the boundary matching mechanism require all frames as the input. But for saving the memory, I think you can try 2 ways to reduce the temporal size.

Strided sample frames for each video. For example, only use the 1st, 3rd, 5th, 7th, ... frames, so you will have less frame counts.
Interpolate the temporal axis for a fixed value for each video. For example, no matter the length of the video is, resize to 100 frames.

But the pretrained model is not trained with this preprocessing, so it might not perform well if you want to use this way to evaluate.

ControlNet / LAV-DF

Frame-level processing #12