BasedHardware / omi

AI wearables
https://omi.me
MIT License
3.67k stars 473 forks source link

Over long term connections STT service (soniox) delays up to 20-30 seconds, is it the same with deepgram? #1025

Closed beastoin closed 3 weeks ago

beastoin commented 1 month ago

Bro @josancamon19 feel free to note your experiences / your findings 😌

josancamon19 commented 1 month ago

Findings so far:

Soniox (No VAD)

Soniox (VAD)

~~ What's the problem with VAD here then?

  1. soniox closes the connection after sometimes without receiving bytes thus causing reconnecting states (should be fixed with ping every 10 seconds already)
  2. small words, or short sentences from (empty -> 5 words -> empty) sometimes do not get detected, the model is not good when switching conditions very fast I think. Do not appear or get super delayed (30-60 secs).
  3. last few words after a long conversation can delay very badly (30-60 secs too).
josancamon19 commented 1 month ago

Deepgram is instant! after 100ms utterance, is almost instant, all the time.

with VAD doesn't really make a difference, makes it 3 seconds delay, max 5.

The only part annoying, is when switching sockets, a couple seconds might be lost at that point.

Short words burst, take 5-8 secs, but then becomes instant again.

josancamon19 commented 1 month ago

Concerns about Deepgram:

  1. Lower transcript quality than soniox : /
  2. Non native speech profile identification limits future use, and is not better than soniox native person recognition.

Concerns about Soniox:

  1. Initially thought it was faster than deepgram, but after a long time the socket connected, it delays more.
  2. Delay with VAD, on short words, or after a long period of silence, is 6-10 times worst than deepgram.
beastoin commented 1 month ago

great finding!

beastoin commented 1 month ago

Image

beastoin commented 1 month ago

what if the soniox also need some words to keep the connection, the keepalive will not help :?

beastoin commented 1 month ago

Sent an email to Soniox Team about the finding 15-20s delays after a long lived session.

beastoin commented 3 weeks ago

just leave a check, smooth for me now, maybe get back in the future .