Consider creating a dual-tone voice

In my tests I have been using dtmf as a voice to be detected by speech recog system that is able to understand dtmf 'voice'. However, this conflicts with cases where the speech recog system also supports DTMF detection. So as an alternative to this, I am implementing speech synth and recog using flite and pocketsphinx. However, when sampling rate is 8000hz (pcmu/pcma PSTN), pocketsphinx will not perform well. And even when using speex/16000hz, although pocketsphinx shows better results it would still require adjustments with trial and error. So as another alternative, I am planning to implement support for access to external speech synth/recog via a websocket server that would proxy requests to google speech, amazon poly, openai whisper (that can be executed locally) etc. But still, we might prefer a non-voice recog approach. For this what we can try is to extend the DTMF tones and create a new "dual-tone" voice. This way, there will be on conflict.

MayamaTakeshi / sip-lab

Consider creating a dual-tone voice #92