Walleclipse / Deep_Speaker-speaker_recognition_system

Keras implementation of ‘’Deep Speaker: an End-to-End Neural Speaker Embedding System‘’ (speaker recognition)
245 stars 81 forks source link

Asking for clarification #12

Closed decuvillo closed 4 years ago

decuvillo commented 5 years ago

Hi, Thank you for the code that is extremely helpful. I have some questions please:

1) In the paper, it is noticed that the reported results are both of speaker identification and verification. I presume that the accuracy stands for the identification result. But i'm still confused since the speakers in the training set are different from those of the testing set.

2) what is the difference between select_batch.create_data_producer and stochastic_mini_batch?

3) Which variable indicates the number of epochs used?

4) In the paper, it is indicated that 64 utterances are used per mini-batch. Does it corresponds to candidates_per_batch variable that is set to 640?

5) For running the code (training and testing). Is running train.py sufficient ? How much time it takes to get results please?

Thank you in advance

Walleclipse commented 5 years ago

Hi, I am apologize to reply late.

  1. The core of this paper is speaker embedding. I think, if you try to use embedding to identification task (classification), you need to ensure the speaker in test data must be seen in train data. I think, identification result in the paper is evaluated in the same speakers. (mainly for pretraining procedure)

  2. random_batch.stochastic_mini_batch select the negative sample totally randomly. Thus, select_batch.create_data_producer create the batch with hard negative samples. (It is run in multiprocessing). You can check the issue11 and issue8

  3. Sorry, I do not record the number of epochs for training procedure. I just record the number of steps as grad_steps. You can calculate the epoch as epoch = grad_steps// (len(train_data)). PS: I do not set the maximum step (or epoch) to terminate the training. You can set by yourself, just modify the line 117 in train.py.

  4. candidates_per_batch is not need to corresponds to 64 utterances. Please check the select_batch.py. candidates_per_batch=640 means, in each step I have 640 candidate utterance, I will select the best 64 (or batch_size) utterances from them. candidates_per_batch can be adjusted by yourself.

  5. If you are already collected data, you just need to run train.py. For this repo, you can just clone this and run train.py (I already prepared the data in audio folder). I run the code in gpu, it takes about 4~5 hours.

decuvillo commented 5 years ago

Thank you very much!

Walleclipse commented 5 years ago

You are welcome