flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

[How can I use mel-spectrogram as features?] #805

Closed mlexplore1122 closed 3 years ago

mlexplore1122 commented 4 years ago

Question

I have checked in Defines.cpp file, and just see wav2letter using mfsc, or mfcc feature, and don't have option for using mel-spectrogram as feature? I need use mel-spectrograms as feature? And I wanna ask how can i use mel-spectrogram as feature in wav2letter, thanks you.

lunixbochs commented 4 years ago

mfsc and mfcc are both variants of mel spec. mfcc is mfsc with a DCT. mfsc works fine.

mlexplore1122 commented 4 years ago

Thanks lunixbochs I know mfsc and mfcc are both variants of mel spec, but I don't deep understand the difference of each feature. My language has tone, so pitch feature is important. I have train my dataset with nemo(quartznet network) and transformer with espnet and both have fast converge and good result. But all feature they use is mel spectrogram, and I am not sure, problem when i train model with streaming convnet is feature or about difference about architecture of network. Do you have any suggest?

tlikhomanenko commented 3 years ago

Probably the issue with the architecture and its hyperparameters for your data. So you need to tweak model size, optimization to make it work with your data.