WeidiXie / VGG-Speaker-Recognition

Utterance-level Aggregation For Speaker Recognition In The Wild
362 stars 98 forks source link

Preprocessing function - WAV extending #60

Closed celpas closed 4 years ago

celpas commented 4 years ago

Hi. I have a question about the preprocessing function. Why in the evaluation mode you need to extend the wav replicating itself?

def load_wav(vid_path, sr, mode='train'):
    wav, sr_ret = librosa.load(vid_path, sr=sr)
    assert sr_ret == sr
    if mode == 'train':
        extended_wav = np.append(wav, wav)
        if np.random.random() < 0.3:
            extended_wav = extended_wav[::-1]
        return extended_wav
    else:
        extended_wav = np.append(wav, wav[::-1])
        return extended_wav

Also, I'm getting slightly different results on the test set of VoxCeleb1 compared to those reported in the paper with your weights... are the weights the same of the paper or I'm doing something wrong?

WeidiXie commented 4 years ago

Hi,

During inference, it simply replicates itself and reverses it, this is a very naive test augmentation.

About the results, this is the model from the paper, you should get 3.22 on VoxCeleb1-Test, how different are you talking about ?