Implement feature extraction

Implement feature extraction independently from the training part: Since this part takes some time, it should be done once so that multiple trainings can be done without computing them each time.

input is the voxceleb2 dataset. output is a mfcc binary dataset (MFCC storage format to define.). configuration parameters:

utterance duration (default value: 1s)
MFCC parameters (exhaustive list to define)

Utterances longer than the one used for features are split in several utterance. This allows to increase the dataset.

Notes: No background sound is added and no volume adjustment is done because the voxceleb2 dataset already contains various backgrounds.

MainRo / deep-speaker

Implement feature extraction #2