brilliantlabsAR / noa-for-ios

You AI companion. ChatGPT and translation for Monocle AR
ISC License
70 stars 14 forks source link

[feature request] Local speech recognition #84

Open josuah opened 4 months ago

josuah commented 4 months ago

Token and data plan saver: perform the transcription of audio locally, which seems to have very good results, as it is part of Apple Siri and Google Assistant products, as well as voice input as used to speak text messages.

IMG_0524

https://discord.com/channels/963222352534048818/984966420603482182/1210382075363065906

lukeswitz commented 2 months ago

Ran into all sorts of problems because monocle couldn’t handle any sort of bandwidth. Using the phone across the room was more accurate also. It’s possible if it’s chunked but the lag makes for a slow experience. Maybe frame has more throughput I’ll have to try it out when they land.

josuah commented 2 months ago

The Frame device does have a better Bluetooth bandwidth.

It is possible to choose a trade-off between low/high-resolution and low/high-bitrate audio for a compromise between bandwidth and speed.

There was not yet anyone to experiment with audio compression using StreamLogic, and audio compression for Frame was suggested here: https://github.com/brilliantlabsAR/frame-codebase/issues/134#issuecomment-2093907836

But it seems like the FPGA is already full with JPEG encoding, so a trade-off would be needed to fit FPGA-based compression.