Open MainRo opened 6 years ago
@aouinizied , an initial skeleton is ready to process the utterances. Currently the mp4 files are decoded, and saved as a wav files. We can start working on splitting and mfcc extraction. This should be done with evolutions in the process_audio function.
Implement feature extraction independently from the training part: Since this part takes some time, it should be done once so that multiple trainings can be done without computing them each time.
input is the voxceleb2 dataset. output is a mfcc binary dataset (MFCC storage format to define.). configuration parameters:
Utterances longer than the one used for features are split in several utterance. This allows to increase the dataset.
Notes: No background sound is added and no volume adjustment is done because the voxceleb2 dataset already contains various backgrounds.