fossasia / susi_android

SUSI.AI Android App https://play.google.com/apps/testing/ai.susi
Apache License 2.0
2.42k stars 1.11k forks source link

Add feature to train hotword detection using Snowboy training API #806

Closed chiragw15 closed 5 years ago

chiragw15 commented 7 years ago

As mentioned in this comment https://github.com/fossasia/susi_android/pull/710#issuecomment-312267662 , implement a feature to train hotword detection using snowboy training API.

Problem for now : The hotword detection uses a model file from snowboy website. This model file is of 2 types : susi.pmdl and susi.umdl. pmdl stands for personal model and umdl stands for universal model. The process of getting a personal model file (.pmdl) is simple. Just say susi thrice on snowboy website and download the personal model and use it. But the problem with personal model is that it is defined for a specific person. I am using my personal model (susi.pmdl) right now for hotword detection so it works great for me and people with similar voice like me but not for everyone. And to get a universal model file (.umdl), we need minimum 500 person to train the susi hotword on snowboy website by saying 'susi' thrice. Once we have a universal model file, the hotword will work great for everyone. But it may take time since right now I see only 10 people trained the model and we want 490 more.

Alternate for this now : Snowboy provides an api to train and get a pmdl file. As mentioned in above comment, according to snowboy docs

You can define truly customized hotword for each of your end customer. Just ask them to say the hotword 3 times and a model will be trained on the fly!

    Endpoint: https://snowboy.kitt.ai/api/v1/train/
    Type: POST
    Return: a binary personal model (.pmdl), or error

We can generate a pmdl model for everyone by asking them say 'susi' thrice after installation of app. When the user will run the app after installation, he will be prompted to say susi thrice. These three recordings will then be sent to this http://docs.kitt.ai/snowboy/#api-v1-train API as a post parameter. The API will return a .pmdl file which then will be used for hotword detection. By this way everyone will use their personal model and hotword detection will work for them smoothly. Once we have universal model trained by 500 person, we can update the app and use universal model which will work for everyone and there won't be a need for every new user to train model with his/her voice.

screenshot_2017-07-18-23-04-44

screenshot_2017-07-18-23-04-37

hardik124 commented 7 years ago

Hey, @chiragw15 .If we get this file where should we store it? I think I can take up this issue. Are you currently working on it or can I go ahead with it?

chiragw15 commented 7 years ago

If we get this file where should we store it?

There is a folder in storage with name snowboy, we have to store this file in that folder with name susi.pmdl

Please check this PR https://github.com/fossasia/susi_android/pull/889

In this PR, I have mentioned the problems that I am facing. Would be great if you could help me out

hardik124 commented 7 years ago

@chiragw15 , I was somehow able to extract AMR file from google speech recognizer. Will make a PR as soon as I fix this issue. Could you please tell me all the parameters required by API?

hardik124 commented 7 years ago

@chiragw15 Please tell me what parameters have to be used?

chiragw15 commented 7 years ago

@hardik124 http://docs.kitt.ai/snowboy/#api-v1-train

hardik124 commented 7 years ago

@chiragw15 there is a secret user token required. what is that ?

chiragw15 commented 7 years ago

screenshot_20170909-012329 @hardik124

chiragw15 commented 7 years ago

Use your account for now. Will change it later.

hardik124 commented 7 years ago

@chiragw15 sure. Is it fine if I make an activity or should I make a fragment?

chiragw15 commented 7 years ago

No need of writing code again. I have already written most of code for this issue in #889 . Pull that PR to your local machine and work from there. To pull a PR locally use git fetch upstream pull/889/head:BRANCHNAME and then checkout that branch using git checkout BRANCHNANE

hardik124 commented 7 years ago

Yeah, will do.

hardik124 commented 7 years ago

@chiragw15 , ffmpeg is failing. Is there any other way you know of to convert AMR to WAV. Should I try building other ffmpeg from scratch (NDK )?

chiragw15 commented 7 years ago

ffmpeg is failing.

Exactly. I was facing the same issue. This is the reason I was not able to proceed further. Didn't find another way to do that. Don't build ffmpeg from scratch. @chashmeetsingh Can you help us here. How did you implement this in IOS?

chashmeetsingh commented 7 years ago

I am not sure how that's supposed to be done on android. For iOS, what I did was:

hardik124 commented 7 years ago

Did u not verify if the word is infact Susi?

On 22-Sep-2017 7:16 PM, "Chashmeet Singh" notifications@github.com wrote:

I am not sure how that's supposed to be done on android. For iOS, what I did was:

  • Run the audio engine
  • Save the audio buffer in a .wav file
  • Converted that to base64 and used it in the API @chiragw15 https://github.com/chiragw15 ^

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fossasia/susi_android/issues/806#issuecomment-331451146, or mute the thread https://github.com/notifications/unsubscribe-auth/AGV4XL1hYXqgWcfkO4fwJV9qQILIySkFks5sk7oygaJpZM4ORBbn .

chashmeetsingh commented 7 years ago

I did. That's done with the help of the audio recorder and the speech to text that runs alongside the audio engine. What happens is that the audio engine and speech to text that runs simultaneously, whenever SUSI is spoken, the STT stops and I use the buffer and convert it to base64.

hardik124 commented 7 years ago

@chiragw15 @chashmeetsingh, I was able to convert using FFmpeg , turns out, there were problems without Uri itself. Conversion takes a lot of time, I think we should make a job scheduler for that and show a progress notification. I am attaching the wav files converted using ffmpeg recordings.zip

hardik124 commented 6 years ago

@chiragw15 I tweaked ffmpeg to convert files in correct format. However, On making a request on API I am getting a 502 error. Can you look into it?

aayushsingla commented 5 years ago

Is this still relevant?

naman653 commented 5 years ago

@batbrain7 @arundhati24 @iamareebjamal Can you please tell me if this issue is still relevant. I know there are bugs with voice detection but I think Snowboy API has already been implemented. I want to improve this issue, but if you can tell me what major issues should be resolved. Please enlighten.