ControlNet / AV-Deepfake1M

[ACM MM] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
https://arxiv.org/abs/2311.15308
Other
69 stars 3 forks source link

Question about the source of proposed dataset #6

Closed XuecWu closed 5 months ago

XuecWu commented 5 months ago

Thank you for your exciting work! As described above, I want to know where is the source of the proposed dataset? I have read the full paper carefully, but I did not find information about the source of the AV-Deepfake1M.

Looking forwards to your reply! Best regards,

ControlNet commented 5 months ago

Hi,

From Section 3.1,

The three-stage pipeline for generating content-driven deepfakes is illustrated in Figure 2. A subset of real videos from the Voxceleb2 [14] dataset is pre-processed to extract the audio using FFmpeg [47], followed by Whisper-based [41] real transcript generation.

XuecWu commented 5 months ago

Hi,

From Section 3.1,

The three-stage pipeline for generating content-driven deepfakes is illustrated in Figure 2. A subset of real videos from the Voxceleb2 [14] dataset is pre-processed to extract the audio using FFmpeg [47], followed by Whisper-based [41] real transcript generation.

Got it. Ccould you tell me the specific number of videos sampled from the Voxceleb2? This is important to my work. Thank you!

ControlNet commented 5 months ago

All the real samples are from VoxCeleb2, i.e. 286721.

XuecWu commented 5 months ago

Got it. Thank you!