asking the performance on AV-deepfake-1M

Have the performance metrics shown in the paper (Table 6. Temporal deepfake localization benchmark) been trained on the av-deepfake-1m dataset? Or are they direct inference using the pre-trained models provided by the previous methods?

I tried using UMMAFormer for inference at AV-deepfake-1m, and even though it's only on a portion of the data, the performance is way behind what's shown in Table 6.

Thanks!

ControlNet / AV-Deepfake1M

asking the performance on AV-deepfake-1M #1