Have the performance metrics shown in the paper (Table 6. Temporal deepfake localization benchmark) been trained on the av-deepfake-1m dataset? Or are they direct inference using the pre-trained models provided by the previous methods?
I tried using UMMAFormer for inference at AV-deepfake-1m, and even though it's only on a portion of the data, the performance is way behind what's shown in Table 6.
Have the performance metrics shown in the paper (Table 6. Temporal deepfake localization benchmark) been trained on the av-deepfake-1m dataset? Or are they direct inference using the pre-trained models provided by the previous methods?
I tried using UMMAFormer for inference at AV-deepfake-1m, and even though it's only on a portion of the data, the performance is way behind what's shown in Table 6.
Thanks!