alphacep / vosk-android-service

Offline voice typing for Android
Apache License 2.0
55 stars 10 forks source link

Doesn't work in AnySoftKeyboard somehow, requires google instead #4

Closed nshmyrev closed 1 year ago

nshmyrev commented 2 years ago

Probably it changed over time

paolo-caroni commented 2 years ago

Try this LocalSTT there is english .apk package, and it run on Anysoftkeyboard. It uses vosk, so basically is the same of this app.

sogaiu commented 1 year ago

@nshmyrev I tried adding some Log.d calls and checks to onActivityResult in AnySoftKeyboard's ActivityHelper around here: https://github.com/AnySoftKeyboard/AnySoftKeyboard/blob/a8b18b340b657a548266cf0da24368d7f3d9cb90/ime/voiceime/src/main/java/com/google/android/voiceime/ActivityHelper.java#L61-L71

After installing a newly built apk I tried testing the vosk android service via AnySoftKeyboard's voice typing button. I saw the dialog box for vosk show up along with a sound so I tried speaking a word but I didn't end up seeing the word show up as text in the apps I tried with.

Going through logcat output, I see output like this:

10-29 22:48:27.481 15368 15368 I SpeechRecognizerActivity: onResults
10-29 22:48:27.482 15368 15368 I SpeechRecognizerActivity: target
10-29 22:48:27.483 15368 15368 D SpeechRecognizerActivity: Intent { act=android.speech.action.RECOGNIZE_SPEECH flg=0x13000000 cmp=org.vosk.demo/org.vosk.service.ui.SpeechRecognizerActivity (has extras) }
10-29 22:48:27.483 15368 15368 D SpeechRecognizerActivity: Bundle[mParcelledData.dataSize=300]
10-29 22:48:27.483 15368 15368 D SpeechRecognizerActivity: No pending intent, setting result intent.

I took that as a sign that the service succeeded in receiving the utterance and it performed its recognition.

Then a bit later I can see that ActivityHelper's onActivityResult is called:

10-29 22:48:27.515 14705 14705 D ActivityHelper: onActivityResult: intent is null!

I checked the value of the data parameter (an Intent) to onActivityResult and it appears to be null for some reason.

I observed these results on two Samsung devices -- one with Android 10 and the other with Android 12.

nshmyrev commented 1 year ago

@sogaiu Thanks for information, it is useful. Lets try to investigate deeper

sogaiu commented 1 year ago

@nshmyrev I looked a bit more closely and haven't come up with much.

One thing I had missed previously is that resultCode was 0 -- which IIUC corresponds to RESULT_CANCELED. Perhaps data being null makes sense in this context.

As to why the resultCode is RESULT_CANCELED though, I have no idea.

sogaiu commented 1 year ago

I've been coming across a claim of startActivityForResult / onActivityResult being deprecated but haven't determined specifically which circumstances apply and whether it's relevant for vosk-android-service / AnySoftKeyboard.

For reference, the following is an SO answer that mentions it in the context of the post "OnActivityResult always returns RESULT CANCELED (Code: 0) on Android 11": https://stackoverflow.com/questions/70952359/onactivityresult-always-returns-result-canceled-code-0-on-android-11

Update: this doesn't seem to be relevant AFAICT based on my testing.

sogaiu commented 1 year ago

I tried removing android:launchMode="singleInstance" from: https://github.com/AnySoftKeyboard/AnySoftKeyboard/blob/a8b18b340b657a548266cf0da24368d7f3d9cb90/ime/voiceime/src/main/AndroidManifest.xml#L15 and that results in the data parameter becoming non-null.

I'm not sure about this change, but at least locally I get much better behavior. Specifically, recognition appears to be performed and I see a dialog box that shows candidate results. Choosing a result leads to the chosen text being inserted into the appropriate context (though sometimes this appears to be delayed).


There was an SO answer that had the following quote (may be from older android docs?):

For example, if the activity you are launching uses the singleTask launch mode, it will not run in your task and thus you will immediately receive a cancel result.

FWIW, I didn't manage to find exactly this text at the current docs.

sogaiu commented 1 year ago

So far I've only had luck with English.

I downloaded the Chinese model via vosk-android-service's ModelListActivity(?) and set it to be active, but I haven't had success recognizing Chinese yet. Recognition occurs and I see the dialog, but the candidates all look to be English.

Any hints?

nshmyrev commented 1 year ago

@sogaiu thanks for exploring this! Very useful. As for Chinese, check the logcat output, it should have some details.

sogaiu commented 1 year ago

@nshmyrev I noticed some (related?) error-like log lines in logcat output about the value associated with EXTRA_LANGUAGE needing to be String, but somehow it was a Locale. Perhaps the following line is relevant:

https://github.com/alphacep/vosk-android-service/blob/6a719773c100a87ef8d96782326a98f29d714b8c/app/src/main/java/org/vosk/service/ui/SpeechRecognizerActivity.java#L177

I tried changing that to:

speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault().toString()); 

and the errors seemed to go away.

Unfortunately, that didn't yireld non-English recognition.


Next I tried changing the device's locale to be Chinese, set the Chinese model to be active in vosk-android-service (I saw the check mark show up and I also killed the activity and started it again to see if the setting had persisted), and tested.

On a side note, it looks like changing the device's locale leads to automatic enabling of Google Voice Typing keyboards (one of my devices shows two of them -- one is labeled "Legacy"). If these (or may be it's just one of them?) are not disabled, when AnySoftKeyboard's voice typing button is pressed there is no option to choose vosk, so it seems important to disable them before testing so that one can choose to use vosk for recognition (I also turn off the device's WiFi).

Anyway, during testing, vosk-android-service's SpeechRecognizerActivity is displayed but I hear the failure sound quite soon after I see the activity. I don't really get a chance to say anything in this scenario.

I logged Locale.getDefault().toString() and found it to be: zh_CN_#Hans

The only other lines I see in logcat output that are prefixed with SpeechRecognizerActivity have content:

IIUC, the value of 2 for onError: 2 corresponds to "a generic client error" according to these docs.

Update: I think 2 might actually be ERROR_NETWORK, since at least according to these docs, onError's parameter code uses values from SpeechRecognizer.

That's a bit unexpected as I was not expecting network activity. I wonder if some other network-using recognition is being attempted...


I got analogous unsuccessful results for Japanese. The value for Locale.getDefault().toString() was ja_JP, FWIW.

sogaiu commented 1 year ago

@nshmyrev Made some progress.

So it looks like somehow the device keeps switching the setting of Settings -> Apps -> Choose default apps -> Device assistance app -> Voice input to Speech Services by Google. When I explicitly set this to Vosk Speech Recognition and test, I have had success with both Chinese and Japanese (logs show these to be done by VoskRecognitionService).

Between these successful tests I noticed that the setting got switched back to Speech Services by Google. I'll try to figure out how this is happening -- I'm pretty sure I didn't do it directly / on purpose, but may be there's something else I'm doing tthat's triggering the switch.

Based on the past log content I guess that until I changed this Voice input setting, the underlying recognition was being performed by Speech Services by Google (which also happens to support offline for just US English IIUC) -- so I think even the succesful English tests from earlier were not being done by vosk (e.g. the logs mentioned SODA -- which IIUC is what origniated behind Chrome's Live Capture feature). I think the ERROR_NETWORK mention above was perhaps caused by use of Speech Services by Google.

sogaiu commented 1 year ago

I think I understand what is triggering the Voice input setting to be Speech Services by Google.

IIUC, changing the active model via ModelListActivity doesn't seem to cause a new model to become used for recognition if some other model had already been loaded once before (at least while the corresponding app is still running?).

In order to test different models I was using the Force stop button for Vosk available via Settings -> Apps -> Vosk (Settings here refers to the device's Settings app). In testing here, it appears that after pressing the Force stop button, the Voice input setting gets set to Speech Services by Google.

sogaiu commented 1 year ago

Perhaps unsurprisingly, it looks like the Voice input setting can be in a different location depending on the device / OS version.

For example, in an AVD emulator of a Pixel 3a with API 33, I see it located at Settings -> System -> Languages & input -> Speech -> Voice input. Oddly, typing voice into the Search settings text field in the Settings app doesn't turn this up, so it appears that one must manually navigate to get to this setting...

As another example, in an AVD emulator of a Pixel 3a with API 28, I see it located at Settings -> Apps & notifications -> Default apps -> Assist & voice input -> Voice input. Typing voice into the Search settings text field in the Settings app DOES turn this up though, so that's a bit more convenient.

(These notes are meant as hints for future reference -- perhaps they will come in handy some day.)

nshmyrev commented 1 year ago

I think we mostly focus on open source android images - LineageOS, etc.

sogaiu commented 1 year ago

Ah I see, thanks for the info.

It's been a while since I've used LineageOS and I don't have any devices with any version installed. It looks like at least a few years ago there were no prebuilt AVD images.

I guess it's possible to build an image one can use via an emulator: https://wiki.lineageos.org/emulator

Will think about whether to try.

drew-sinha commented 1 year ago

Just a heads up: working for me on AnySoftKeyboard per my configurations in #29 & #30 using those minimal proposed fixes + the current master.

nshmyrev commented 1 year ago

Great job, closing then