Add support for websocket server to start_speech_recog

Similar to #79. We could implement support for example to local whisper transcription but I usually run sip-lab in low-end VMs with limited resources. Instead we can just establish a WebSocket connection to Speech Server and stream audio to it. Then we can use any STT engine like gsr, whisper etc.

The function call would be like this:

sip.call.start_speech_recog(call_id, {
  server_url: 'ws://127.0.0.1/speech_recog',
  engine: 'whisper',
  language: 'en-US',
  media_id: 0,  //optional
})

MayamaTakeshi / sip-lab

Add support for websocket server to start_speech_recog #80