HPI-DeepLearning / crnn-lid

Code for the paper Language Identification Using Deep Convolutional Recurrent Neural Networks
GNU General Public License v3.0
105 stars 48 forks source link

Question #6

Open lvaleriu opened 5 years ago

lvaleriu commented 5 years ago

Why do we need to do this in fact: "Use ffmpeg to convert and split WAV files into 10 second parts"?

After downloading we have big wav files. We can then directly convert them to spectogram image files. This will slice anyway the image into 10 seconds spectograms.

Bartzi commented 5 years ago

Of course you can also do it in this way... if you think that this works better for you, then go ahead...

lvaleriu commented 5 years ago

It is mainly because i dont need to store segment wav files too (which is 88 gb on my disk). I already store the youtube downloaded files directly to mp3 now for the same reason. And i've managed to extract 10 seconds spectograms from the mp3s quite fast actually.

omfuke commented 3 years ago

how much amount of data I should use for classifying between Hindi and English? is 20000 spectrogram per language is sufficient ?

Bartzi commented 3 years ago

Sounds like a good amount of data. I think it could work!