deeeed / expo-audio-stream

16 stars 4 forks source link

Squeaky voices on web #19

Open raghavsethi opened 2 hours ago

raghavsethi commented 2 hours ago

In our testing we were trying to use Deepgram to transcribe in realtime, which works very well with expo-audio-stream. However, in case the user went offline/had poor network we wanted to have a fallback where we can send the entire file to Whisper at the end in case anything goes wrong.

We got very poor results with Whisper and were pretty confused until I tried to actually play the wav file we were sending, and it turns out it's in the wrong pitch! We thought we'd messed some configuration up on our end, so to try and narrow the scope of the problem I just replaced our root component with exactly the code here and it has the same problem:

https://github.com/user-attachments/assets/4ba852af-f7ca-4bcf-aa33-6dcbb1410419

Any ideas what might be going on or how to debug? We tried different sample rates etc and couldn't find anything that worked (other settings sounded even worse).

Deepgram is able to make sense of the streaming audio frames (which we just dump as base64/byte arrays into the websocket), which implies that the raw data is probably valid but something else is going wrong.

deeeed commented 2 hours ago

Can you try to keep the bitDepth to 32bit pcm on web? The "squeesheeness" can come from re-scaling the audio. You can compare the values from the playground app if you open it on each platform it should detect the correct settings.