andabi / voice-vector

Deep neural networks for getting text-independent speaker embedding written in TensorFlow
MIT License
307 stars 83 forks source link

how preprocessing voxceleb data and acc==0? #5

Open colinsongf opened 6 years ago

colinsongf commented 6 years ago

how process voxceleb data for run trian.py?

why I iter 200 using 2GPU, but acc=0?

ChristopherLu commented 6 years ago

Hi,

Got the same issue on the voxceleb_v1 dataset. I can see the loss is consistently decreasing from 7.2 to 6.01, but the eval/train accuracy is always 0. Have you solved this?

colinsongf commented 6 years ago

i‘m not , sorry

ChristopherLu commented 6 years ago

Is this because lack of certain pre-process step for voxceleb data?

colinsongf commented 6 years ago

i think so, but i do not how to pre-process voxceleb data!

colinsongf commented 6 years ago

how to Voxceleb dataset preprocessing for dropping silence segments

andabi commented 6 years ago

@ChristopherLu @colinsongf I proprocessed voxceleb dataset to be sample rate 16,000 that is my config in default.yaml

ChristopherLu commented 6 years ago

@andabi

Could you share us the procedure to get the 'voxceleb_norm'? Is it the data after pre-processing? We are confused about the right procedures to run the code for voxceleb, it wold be great if you could share us the recipe or pipeline to achieve this.

Thanks

andabi commented 6 years ago

voxceleb_norm is the processed dataset. The dataset is structured to directories for each celeb. Each directory contains each celeb's wav files which have sample rate 16,000 and format is 'wav'. You need to preprocess above before training.

colinsongf commented 6 years ago

image after processed all wav to sample rate 16,000, the result is acc=0, why?

andabi commented 6 years ago

Keep the training at least a few days because voxceleb is huge. I kept training the model a few days using 8 gpu to get over 90% eval accuracy.