Speech Profile [Part 1] ~ Opus

josancamon19 commented 3 months ago

Is your feature request related to a problem? Please describe. https://github.com/BasedHardware/Friend/blob/7539e2eb315ca3fba4ce07b8ed1057b03dfdc09e/backend/routers/transcribe.py#L66

The websocket shift doesn't work with opus as even after uploading all samples from opus recording, in the samples tab in the app, we have locally a wav file with 16khz, but apparently not in opus.

Connect your device with opus firmware.
Setup your speech profile samples with this device.
Restart the app / websocket
This will start a websocket 1 (where samples are uploaded) but you will see that deepgram util is not receiving the text from that transcription.
And then once ws2 is killed and connection is moved to ws1, appears if like the codec in the ws is other, thus causing the opus bytes to not working once that happens, you'll see transcription works for the 30 seconds of uploading the sample file.
if you set codec to pcm16, you will notice the print from the sentences that deepgram streams, but once the websocket switches, again nothing will work.

ebariaux commented 3 months ago

I haven't looked at the app or server code but wanted to react to your comment on "we have locally a wav file with 16khz, but apparently not in opus".

You need to consider the difference between the container file format and the codec used. Raw audio (your content) is uncompressed LPCM, just the bytes representing the samples. WAV is the container and usually it stores uncompressed audio in LPCM (linear PCM) format, just adding a header to the raw bytes. It supports some basic compression schemes but not Opus AFAIK. Opus is the codec, so it takes LPCM and encodes it into some opus compressed format or goes from opus format to LPCM.

A (far from perfect) analogy would be. You have text as your content. Your container / file format could be a word document. You can zip your text (content) e.g. to send it over the wire but you don't store a zip in a word document. You capture text in your Omi device, zip it, send it over BLE, unzip it on your phone and store it in a word document. Same with audio, capture in your Omi, encode with Opus, send over BLE, decode with Opus (you get LPCM back) and store in WAV file.

Hope some of this makes sense...

josancamon19 commented 3 months ago

Done

BasedHardware / omi

Speech Profile [Part 1] ~ Opus #556