BasedHardware / omi

AI wearables
https://omi.me
MIT License
3.57k stars 426 forks source link

Auto detect language #1076

Open Christofon opened 2 days ago

Christofon commented 2 days ago

Is your feature request related to a problem? Please describe. I speak multiple languages during the day, currently I have to manually switch langs in the app.

Describe the solution you'd like As far as I can see the audio is being streamed to deepgram for transcription. If this could be switched to batched requests (essentially using the pre-recorded api) they could autodetect the language being spoken.

Describe alternatives you've considered Make switching languages easier in the app by eg. mark specific languages as favourite and provide toggle between them.

Additional context I would gladly take a shot at this myself but wanted to make sure there isn't anything obvious I am missing.

ArchonMegalon commented 2 days ago

We are raising our kids bilingual. Im only speaking german, my wife only english. So if we speak to each other, it is a constant mix of the two languages. Omi doesn't handle that very well.

Our kids are currently not (yet) able to differentiate the two languages and use vocabulary from both in the same sentence.

Every human can understand that easily, but Omi utterly fails to do that.

AnkushMalaker commented 27 minutes ago

This is a difficult problem to solve called code mixing There have been various attempts made to solve it, but even the most famous models for speech recognition right now (ex. whisper) are multilingual but do single language per query.

This is a difficult problem to solve called code mixing There have been various attempts made to solve it, but even the most famous models for speech recognition right now (ex. whisper) are multilingual but do single language per query.

e3e6 commented 19 minutes ago

This is a difficult problem to solve called code mixing There have been various attempts made to solve it, but even the most famous models for speech recognition right now (ex. whisper) are multilingual but do single language per query.

I understand that transcribing several languages within a single conversation might be not solvable at this point, be we might be talking here to identify the language of the particular memory. For example, I'm using russian at home, but english at work, and it's not really convenient to toggle languages in the app.

Have you seen any LLM focused on identifying the language of the voice recording?