ControlNet / LAV-DF

[CVIU] Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
https://www.sciencedirect.com/science/article/pii/S1077314223001984
Other
67 stars 8 forks source link

Metadata generation #6

Closed TimWalita closed 1 year ago

TimWalita commented 1 year ago

Hello, I want to test your code on my own videos. Is there a way to generate the required metadata automatically for my own videos? I.e. how did you generate the metadata for LAV-DF? Do you have code for this?

ControlNet commented 1 year ago

Hi, the metadata format is a json file. Please follow the following description to build metadata for your video,

TimWalita commented 1 year ago

Hi, thank you for your quick answer. Do you have the code to collect most of these information automatically? It would be a pain to do this manually for a lot of videos. Also, what if I don't know the fake periods? Can I just use a face detector to determine the sequence where any face occurs and figure out somehow whether these sequences are fake or not? Or would this just crush the detection accuracy if most of those faces are no fakes?

ControlNet commented 1 year ago

Do you have the code to collect most of these information automatically?

Of course, everything can be done automatically. If you assume your video is fully fake or real and do classification task, you can refer to #4.

Also, what if I don't know the fake periods?

If you only want to use the model to infer some video, you can directly load the model, coding a similar dataloader and forward the data into the model. After that, the model will output a boundary map, using the postprocess code provided in the repo can turn it to segments prediction.

If you want to train the model on some dataset, the label is required.

Can I just use a face detector to determine the sequence where any face occurs and figure out somehow whether these sequences are fake or not? Or would this just crush the detection accuracy if most of those faces are no fakes?

This method only takes the cropped facial video as input, so it cannot do group analysis.