Closed josuah closed 2 weeks ago
Fewest data transmitted: Use a local speech recognition engine and local translation service. (-) A good one might take a huge amount of data and have moderated performance (+) Works without Internet at all
Compromise between local and remote: Use a local speech recognition engine and translation service on a server. (-) Still requires a lot of data to transcribe any language from audio to text (+) very few data transmitted
Most data transmitted: Transmit audio directly to an online "all in one" service like Whisper. (-) Requires most credits (+) Permits to use good quality translation service
This one is hard because there doesn't seem to be a service that support streamed audio input. They all expect some kind of complete file. To make it work, the app would have to somehow chop up the audio between words, but that's hard to predict since context and work order is important for correct translation. Hopefully some provider releases something like this soon, or if something exists, we can try it out. This issue is already noted in our now backend repo so closing here
In order to use the glasses in a conversation, it is required to constantly tap the side of the glasses, to trigger a new translation request.
This prevents using them continuously in an uninterrupted conversation, as asking the interlocutor to speak and pause is currently required.
The ability to continuously translate the audio was requested.