WeidiXie / VGG-Speaker-Recognition

Utterance-level Aggregation For Speaker Recognition In The Wild
362 stars 98 forks source link

feat: remove problem specific code and commented out code #65

Closed bml1g12 closed 3 years ago

bml1g12 commented 3 years ago

Just a little cleanup from the last PR, to remove some code specific to my use-case

Note that although I had to replace

    wav, sr_ret = librosa.load(vid_path, sr=sr)
    assert sr_ret == sr

with

    wav, sr_ret = sf.read(vid_path)

To fix a librosa issue of it taking 1 second to load each clip, but it may be a problematic fix as the result is variable sample rate per clip. As such this PR restores the original load_wav as the default, but adds a note to the readme explaining this option.