MainRo / deep-speaker

An implementation of deep speaker from baidu
MIT License
6 stars 3 forks source link

Implement feature extraction #2

Open MainRo opened 6 years ago

MainRo commented 6 years ago

Implement feature extraction independently from the training part: Since this part takes some time, it should be done once so that multiple trainings can be done without computing them each time.

input is the voxceleb2 dataset. output is a mfcc binary dataset (MFCC storage format to define.). configuration parameters:

Utterances longer than the one used for features are split in several utterance. This allows to increase the dataset.

Notes: No background sound is added and no volume adjustment is done because the voxceleb2 dataset already contains various backgrounds.

MainRo commented 5 years ago

@aouinizied , an initial skeleton is ready to process the utterances. Currently the mp4 files are decoded, and saved as a wav files. We can start working on splitting and mfcc extraction. This should be done with evolutions in the process_audio function.