csun22 / Synthetic-Voice-Detection-Vocoder-Artifacts

This repository is related to our Dataset and Detection code from the paper: AI-Synthesized Voice Detection Using Neural Vocoder Artifacts accepted in CVPR Workshop on Media Forensic 2023.
https://arxiv.org/abs/2304.13085
MIT License
79 stars 9 forks source link

Clarification Needed on Intra-dataset vs Cross-dataset Evaluation Metrics in Paper #3

Closed chandlerbing65nm closed 1 month ago

chandlerbing65nm commented 9 months ago

I have some questions regarding the evaluation metrics and results presented in Sections 4.4 and 4.5.

Intra-dataset Evaluation (Section 4.4)

The paper reports a very low EER of 0.19% on the WaveFake dataset using the RawNet2 model.

image

Cross-dataset Evaluation (Section 4.5)

On the other hand, the EER significantly increased to 26.95% when the model trained on the LibriSeVoc dataset was tested on the WaveFake dataset. This suggests poor generalization to unseen data.

image

csun22 commented 9 months ago

Hi Chandler, Thank you very much for the question. For your first question, we actually have split the train and test datasets on WakeFake Dateset. For your second question, that is an excellent idea. We are currently working on it, and like your ideas, we are trying to expose to a more diverse set of vocoders during training.