Similar to #79.
We could implement support for example to local whisper transcription but I usually run sip-lab in low-end VMs with limited resources.
Instead we can just establish a WebSocket connection to Speech Server and stream audio to it. Then we can use any STT engine like gsr, whisper etc.
Similar to #79. We could implement support for example to local whisper transcription but I usually run sip-lab in low-end VMs with limited resources. Instead we can just establish a WebSocket connection to Speech Server and stream audio to it. Then we can use any STT engine like gsr, whisper etc.
The function call would be like this: