HPI-DeepLearning / crnn-lid

Code for the paper Language Identification Using Deep Convolutional Recurrent Neural Networks
GNU General Public License v3.0
104 stars 48 forks source link

Performance on short speech #12

Open huntingriver opened 5 years ago

huntingriver commented 5 years ago

Hi there, first, thanks for the toolkit.

I am interested in applying this on short audios. I did a simple test by chopping the web-server/audio/samples audios into 10 seconds segments and ran predict.py separately on these segments with the existing model from web-server folder (assuming this model would be the best;)). When predicting them separately, the accuracy seemed quite low, about 60%. More similar tests with our own dataset received worse results... I understand short audio would be much tougher, but I still wonder if you'd have any insights if we can improve this. Thanks in advance.

Ben

Bartzi commented 5 years ago

Hi,

we are already applying the model to only 10 second segments of the original audio files. The model is trained to be applied on 10 second segments, so it should actually provide you with similar results. We also experimented with 5 second segments, but found that the performance degrades with only 5 snippets, which is most likely due to the fact that there is not enough time based information that can be used by the network to accurately extract the speech information. But all in all, it was not that bad (we had an F1 score of 91%) with 5 second segments, so making it shorter should still work.