alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

what should be the output of your speech server? #103

Open mariasmo opened 6 years ago

mariasmo commented 6 years ago

hi alumae,

i am using the following client:

https://github.com/truongdo/kaldi-gstreamer-android-client

and was just speaking some english words. the output seems to be no transcribed text. when i debugged i could see other properties of the returned JSON object though. is this expected since i am not speaking Estonian? if so, could you suggest some words for me to try just to make sure the above referenced Android client is working(it's about 3 years old and i am not sure of the state of it.so i want to confirm it's working before i write my own based on that code base)

Also i have another question.

So I got your source code and was able to get the server, worker and client all working on a google cloud virtual server. it's working great.

now my question is if the above mentioned android client works, can i point to the server running on my google cloud machine and will it work? do i need to write any additional http stuff(like java script files etc.? sorry i have very little experience in web programming) i.e will my server return the same JSON object that is being returned by your service without me writing any additional code?

the reason i want to do this is so that i can train my own language models in English.

thanks in advance

Kaljurand commented 6 years ago

I would recommend you to use Kõnele (https://github.com/Kaljurand/K6nele) as the basis of your own Android client. It covers all the features of https://github.com/truongdo/kaldi-gstreamer-android-client and has been updated more recently.

For instructions on configuring Kõnele to use a different server, see e.g. https://github.com/jcsilva/docker-kaldi-gstreamer-server

As for testing with Estonian, you can play back an existing audio file (e.g. http://etteytlus.err.ee/1f1dcb59305f4d08283e959cf0240234.mp3) on your laptop, holding your app next to the laptop's speaker. Or use an existing text-to-speech engine to generate the audio (e.g. https://translate.google.com/#et/en/1%202%203%204%205%206%207%208%209%200).