Jakobovski / free-spoken-digit-dataset

A free audio dataset of spoken digits. An audio version of MNIST.
626 stars 248 forks source link

Add other languages #34

Open ujagaga opened 3 years ago

ujagaga commented 3 years ago

Hi! Would you consider adding Serbian language to the dataset? I am interesetd to contribute my voice and as many as I can gather. I suppose this would also be simpler to accomplish if we could gather audio online using an automated website.

Jakobovski commented 3 years ago

Why do you want to use the serbian language?

ujagaga commented 3 years ago

Why do you want to use the serbian language?

Because it is my native language and my older relatives do not speak english well. I intend to collect my own samples, so I just deployed a website to collect the samples in serbian. So far I shared it with a specific group of facebook friends, but soon I will ask others to join, so I hope to gather a decent sample.

https://audiosampler.herokuapp.com/

I adjusted the website code so it can be used in any language and uploaded it to github:

https://github.com/ujagaga/audioSampler

so if you reference it here, perhaps the audio repository can grow in other languages too. The goal for me is to train a personal assistant for offline speach to text and custom command execution based on serbian language.