Metadata generation - Githubissues

ControlNet / LAV-DF

[CVIU] Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

https://www.sciencedirect.com/science/article/pii/S1077314223001984

Other

67 stars 8 forks source link

Metadata generation #6

Closed TimWalita closed 1 year ago

TimWalita commented 1 year ago

Hello, I want to test your code on my own videos. Is there a way to generate the required metadata automatically for my own videos? I.e. how did you generate the metadata for LAV-DF? Do you have code for this?

ControlNet commented 1 year ago

Hi, the metadata format is a json file. Please follow the following description to build metadata for your video,

file: str. The file path.
n_fakes: int. The number of fake segments.
fake_periods: list[list[float]]. A list of fake segments' start and end timestamp pairs, for example [[2.5, 2.78], [3.1, 3.23]] means there are 2 fake segments. One is from 2.5 second to 2.78 second, another is from 3.1 second to 3.23 second. The legnth of this list should fit the n_fakes.
duration: float. length of the video in seconds.
original: str. The original real video of fake video. Set to null for real video.
modify_video: bool. The visual modality is modified or not.
modify_audio: bool. The audio modality is modified or not.
split: The train/dev/test split.
video_frames: The number of frames for this sample.
audio_channels: The number of audio channel. 1 for mono channel and 2 for stereo.
audio_frames: The length of the audio wave form array.

TimWalita commented 1 year ago

Hi, thank you for your quick answer. Do you have the code to collect most of these information automatically? It would be a pain to do this manually for a lot of videos. Also, what if I don't know the fake periods? Can I just use a face detector to determine the sequence where any face occurs and figure out somehow whether these sequences are fake or not? Or would this just crush the detection accuracy if most of those faces are no fakes?

ControlNet commented 1 year ago

Do you have the code to collect most of these information automatically?

Of course, everything can be done automatically. If you assume your video is fully fake or real and do classification task, you can refer to #4.

Also, what if I don't know the fake periods?

If you only want to use the model to infer some video, you can directly load the model, coding a similar dataloader and forward the data into the model. After that, the model will output a boundary map, using the postprocess code provided in the repo can turn it to segments prediction.

If you want to train the model on some dataset, the label is required.

Can I just use a face detector to determine the sequence where any face occurs and figure out somehow whether these sequences are fake or not? Or would this just crush the detection accuracy if most of those faces are no fakes?

This method only takes the cropped facial video as input, so it cannot do group analysis.