Kaljurand / K6nele

An Android app that offers speech-to-text user interfaces to other apps
http://kaljurand.github.io/K6nele/
Apache License 2.0
275 stars 83 forks source link

Dynamically Enable/Disable Rewrite Rules via Buttons #100

Closed devycarol closed 1 year ago

devycarol commented 1 year ago

Rewrite rules can be configured such that the voice input will, for example, only interpret words as their punctuation homophone (e.g. "period" > ., "brace" > {, etc.) It would be really beneficial if the keyboard could have rewrite rule toggle buttons added to it so that the user could on-the-fly enable their own custom "punctuation mode," "spelling mode," or "shift key," etc.

Kaljurand commented 1 year ago

Interesting idea...

You can make a rewrite rule / button that calls: ee.ioc.phon.android.speak/.activity.GetPutPreferenceActivity with extras:

E.g. "Command" = "activity" and "Arg1" =

{
"component": "ee.ioc.phon.android.speak/.activity.GetPutPreferenceActivity",
"extras": {
    "key": "defaultRewriteTables",
    "val": ["punctuation", "spelling"]
  }
}

i.e. you would need to list all the tables, so it's not quite the same as toggling. I guess the toggling feature could be added to GetPutPreferenceActivity quite easily.

So this would be possible already via the existing features (although I haven't tested it). Any improvements beyond that would require some thinking/testing, e.g.

devycarol commented 1 year ago

Honestly the more I think about it the more it becomes a STT-API-level problem. Because say we get the toggle buttons on the main keyboard interface, cool, but then we have to deal with getting the API to handle each individual 'mode'—not to mention multiple at once. Some of these potential modes in my "dream" scenario go beyond simple rewrites into telling the API to only return certain characters/phrases—I'm not sure about the open source ones, but I believe Google's is incompatible with such functionality.

But I imagine that if that API-level puzzle were solved, then having the buttons would simply be a matter of allowing their addition, linking the timestamps of the words outputted with the rules that were active at the time, as well as letting certain buttons disable others when enabled—"only allow words, forcing lowercase" and "allow punctuation only" don't exactly mix very well 😅

Kaljurand commented 1 year ago

Yes, applying the rewrite rules in post-processing to whatever text the service returns by default would maybe cover the simpler use cases, but wouldn't be expressive enough to deal with homophones etc. in general. Also, the rewrite rules only see the returned formatted text, but not any meta-info that the service might send back via its API (such as timestamps, alternative hypotheses, unformatted results).

The rewrite rules can send queries to a REST API (via FetchUrlActivity.java as done in https://docs.google.com/spreadsheets/d/1lxvkGerd_WMljca0dsgxViw_5cnOEgDzneBL-uXI-xI/edit#gid=0, or by using the "getUrl" command). So, if the service exposes certain features via a REST API (e.g. switching between language models) then you might be able to have a button that switches these features on an off between utterances.

Or maybe you can set up multiple services, each possibly with the same backend server, but configured differently, and then use the existing service switching button(s) to effectively switch between features. In this case you'd probably have to implement each service as a separate lightweight app, because Android (probably) does not support spawning services dynamically.