BasedHardware / omi

AI wearables
https://omi.me
MIT License
3.68k stars 473 forks source link

Speech Profile [Part 1] ~ Opus #556

Closed josancamon19 closed 3 months ago

josancamon19 commented 3 months ago

Is your feature request related to a problem? Please describe. https://github.com/BasedHardware/Friend/blob/7539e2eb315ca3fba4ce07b8ed1057b03dfdc09e/backend/routers/transcribe.py#L66

The websocket shift doesn't work with opus as even after uploading all samples from opus recording, in the samples tab in the app, we have locally a wav file with 16khz, but apparently not in opus.

ebariaux commented 3 months ago

I haven't looked at the app or server code but wanted to react to your comment on "we have locally a wav file with 16khz, but apparently not in opus".

You need to consider the difference between the container file format and the codec used. Raw audio (your content) is uncompressed LPCM, just the bytes representing the samples. WAV is the container and usually it stores uncompressed audio in LPCM (linear PCM) format, just adding a header to the raw bytes. It supports some basic compression schemes but not Opus AFAIK. Opus is the codec, so it takes LPCM and encodes it into some opus compressed format or goes from opus format to LPCM.

A (far from perfect) analogy would be. You have text as your content. Your container / file format could be a word document. You can zip your text (content) e.g. to send it over the wire but you don't store a zip in a word document. You capture text in your Omi device, zip it, send it over BLE, unzip it on your phone and store it in a word document. Same with audio, capture in your Omi, encode with Opus, send over BLE, decode with Opus (you get LPCM back) and store in WAV file.

Hope some of this makes sense...

josancamon19 commented 3 months ago

Done