Open irdance opened 4 years ago
yea I got the same thing, pre trained model seems to be inaccurate
Hi guys, Thanks for pointing out this issue. Currently, the pretrained model only works for audio files that are in distribution of input data it was trained on. We deliberately provided hard and out of distribution audio files in the 'real' and 'fake' subfolders to show that this work is still in progress. It is very hard to train a generalized model that will work on any audio files out of the blue. It will be great if you guys can put in some ideas.
Thanks, Sachin
Thanks @ranasac19878 just for clarity was the pre-trained model trained on the test dataset as the model does quite well on the test dataset. The test set does contain 'out of distribution' audio files as some of the fake audio files in the test set are generated from different deepfake audio models.
My hunch is that the variety of accents in the dataset (train + test) is limited and therefore may not work well with different accents.
@irdance the model was not trained on test data set but the test set was used as a validation set to set the hyperparams for the neural network. It is not technically correct to do so but it was very difficult to get a model perform good on test otherwise since the distribution of test set is different than validation set.
In the way forward, we will be working to make the model more resilient using adversarial training and other data augmentation techniques.
Yes speech accent is definitely an indication of distrbution difference but there may be some other small differences in the distribution like the number of pauses, time between pauses etc. that the model might have overfitted on given the training data.
Sachin
Hi, I placed all my audios (both natural and synthesized; total 280) in the path "/data/inference_data/unlabeled" and used the pre-trained model for the classification. Since I am using the terminal mode (Ubuntu), I can't see the "print out with information on predictions of the model, the accuracy of the model on your provided data." However, the result shows likelihood values (correct me if I'm wrong) with a sentence "The probability of the clip being real is: 0.00%". How can I interpret the results? P.S. I have attached the results in a graph format with likelihood values.
Hi Thaya,
Thanks for the info. Currently, the pretrained model works well only for data it is trained/ validated on. If the data distrubution changes, this model will always default its prediction to fake
since the original data had 1:9 ratio or real to fake audio clips. We are working on training another model that will work for out data distribution audio clips in coming months.
The likelihood value is the model propensity score of a clip being real
or not.
Thanks, Sachin
The link to download the asv data in this project seems to be invalid. Can you provide the data or link in the project?
Hi when I ran the inference.py file with all the audio files in the 'real' subfolder - it misclassified them as 'fake'. I just wanted to check that the pre-trained model is the correct one?