Open chrsmj opened 3 months ago
This sounds like a major rearchitecture. Such things should really be discussed ahead of time, for example this would need to be backwards compatible to go into any current branches or even master.
With the place to discuss such things being https://groups.io/g/asterisk-dev
Patch is backwards compatible. May we speak about it next week at AstriDevCon ?
We could, but things should still happen or be recorded in a location where others can participate over a period of time and where it can be referenced for historical purposes.
As well, the way that AEAP handles this is it registers multiple engines each with a unique name, so I don't see why this needs to be that aware of the special ^ thing as it is.
There is precedent elsewhere for the Caret separator when things get clever eg. Dial() 'b' and 'B' options. Colon is for string trimming. Comma is over-used - only added one more in the patch, heh. Extra Parenthesis wrapping ))))) is a reason why some shudder and reach for a GPL instead. Bringing some of the cool parts of AEAP functionality back to plain-jane dial plan DSL is one goal of this patch design.
And in the future, AEAP could be extended with this new horse decorator as well:
[my-speech-to-text]
type=client
codecs=!all,ulaw
url=ws://127.0.0.1:9099
protocol=speech_to_text
[nyquist]
type=horse
url=ws://127.0.0.1:2016
[majestic_prince]
type=horse
url=ws://127.0.0.1:1969
Then you could race all four:
exten => 33729,1,NoOp(D-E-R-B-Y)
same = n,Set(TIMEOUT(D)=1)
same = n,Set(SPEECH_DTMF_MAXLEN=2)
same = n,SpeechCreate(vosk^mage)
same = n,SpeechCreate(vosk^secretariat)
same = n,SpeechCreate(my-speech-to-text^nyquist)
same = n,SpeechCreate(my-speech-to-text^majestic_prince)
same = n,SpeechBackground(dial-here-often,5,,mage^secretariat^nyquist^majestic_prince)
same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
same = n,Set(speechtext0nyquist=${SPEECH_TEXT(0^nyquist)})
same = n,Set(speechtext0majestic_prince=${SPEECH_TEXT(0^majestic_prince)})
same = n,SpeechDestroy(vosk^mage)
same = n,SpeechDestroy(vosk^secretariat)
same = n,SpeechDestroy(my-speech-to-text^nyquist)
same = n,SpeechDestroy(my-speech-to-text^majestic_prince)
same = n,SpeechCreate(vosk^mage)
same = n,SpeechCreate(vosk^secretariat)
same = n,SpeechCreate(my-speech-to-text^nyquist)
same = n,SpeechCreate(my-speech-to-text^majestic_prince)
same = n,SpeechBackground(that-tickles,5,p,mage^secretariat^nyquist^majestic_prince)
same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
same = n,Set(speechtext0nyquist=${SPEECH_TEXT(0^nyquist)})
same = n,Set(speechtext0majestic_prince=${SPEECH_TEXT(0^majestic_prince)})
same = n,SpeechDestroy(vosk^mage)
same = n,SpeechDestroy(vosk^secretariat)
same = n,SpeechDestroy(my-speech-to-text^nyquist)
same = n,SpeechDestroy(my-speech-to-text^majestic_prince)
There is precedent elsewhere for the Caret separator when things get clever eg. Dial() 'b' and 'B' options.
The ^ and other characters are generally only used for suboptions within an option (or when the , would not work due to special parsing considerations).
Comma is over-used
The comma is the standard argument separator - so that would be expected.
I think its better to control this by the grammar field than to have multiple engine connections.
@nshmyrev That is what grammar was originally for in older engines so would make sense.
Current engine:grammar relationship on a channel is 1:N. This patch improves that to N:N.
Personally as a user, I would not want that. I would want to just be able to specify multiple grammars and have the engine provide the results back to me ordered in confidence.
Agreed, that is nice if the engine ranks your results in order of confidence. But not all of them do (yet) cough vosk cough.
And what if your main cloud engine is lagged out, but your less-preferred local backup engines are still available ?
This patch lets users solve it themselves in the dial plan.
Okay, I think this really needs a set of non implementation specific requirements and user stories.
Astricon discussion was great, thanks!
Following the Kentucky Derby, was reminded to clean up the patch a little to fix some memory leaks, but otherwise not a big rewrite to address (potential) concerns of feeding same codec frames to multiple different ASR backends. Although if you can live with that, say, for example, you are only using local Vosk-Kaldi-Docker containers to translate one speaker into multiple languages simultaneously, then this might be a good fit for you.
Feature Description
Current Speech...() apps only let you work with one engine at a time. This presents challenges in multi-lingual IVR environments, situations where you are testing different engines, cloud evaporation events, etc.
The forthcoming patch will allow you to configure the horses inside your engine and sequentially feed them audio frames. It is still a little leaky, eschews linked lists for arrays, and requires additional patches to the engines themselves to do anything useful (patches also forthcoming there eg. Vosk.)
But the dialplan will look like this -- note the carets used to separate the horses (because :horse: :heart: :carrot: :)
And the corresponding res_speech_vosk.conf:
You might replace mage and secretariat with en and es...