[new-feature]: Add horse power in to speech engines

chrsmj commented 3 months ago

Feature Description

Current Speech...() apps only let you work with one engine at a time. This presents challenges in multi-lingual IVR environments, situations where you are testing different engines, cloud evaporation events, etc.

The forthcoming patch will allow you to configure the horses inside your engine and sequentially feed them audio frames. It is still a little leaky, eschews linked lists for arrays, and requires additional patches to the engines themselves to do anything useful (patches also forthcoming there eg. Vosk.)

But the dialplan will look like this -- note the carets used to separate the horses (because :horse: :heart: :carrot: :)

exten => 46773,1,NoOp(H-O-R-S-E) 
 same = n,Set(TIMEOUT(D)=1)
 same = n,Set(SPEECH_DTMF_MAXLEN=2)
 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechBackground(dial-here-often,5,,mage^secretariat)
 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)
 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechBackground(that-tickles,5,p,mage^secretariat)
 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)

And the corresponding res_speech_vosk.conf:

[general]

[mage]
type=horse
url = ws://localhost:2700

[secretariat]
type=horse
url = ws://localhost:2701

You might replace mage and secretariat with en and es...

jcolp commented 3 months ago

This sounds like a major rearchitecture. Such things should really be discussed ahead of time, for example this would need to be backwards compatible to go into any current branches or even master.

jcolp commented 3 months ago

With the place to discuss such things being https://groups.io/g/asterisk-dev

chrsmj commented 3 months ago

Patch is backwards compatible. May we speak about it next week at AstriDevCon ?

jcolp commented 3 months ago

We could, but things should still happen or be recorded in a location where others can participate over a period of time and where it can be referenced for historical purposes.

jcolp commented 3 months ago

As well, the way that AEAP handles this is it registers multiple engines each with a unique name, so I don't see why this needs to be that aware of the special ^ thing as it is.

chrsmj commented 3 months ago

There is precedent elsewhere for the Caret separator when things get clever eg. Dial() 'b' and 'B' options. Colon is for string trimming. Comma is over-used - only added one more in the patch, heh. Extra Parenthesis wrapping ))))) is a reason why some shudder and reach for a GPL instead. Bringing some of the cool parts of AEAP functionality back to plain-jane dial plan DSL is one goal of this patch design.

And in the future, AEAP could be extended with this new horse decorator as well:

[my-speech-to-text]
type=client
codecs=!all,ulaw
url=ws://127.0.0.1:9099
protocol=speech_to_text

[nyquist]
type=horse
url=ws://127.0.0.1:2016

[majestic_prince]
type=horse
url=ws://127.0.0.1:1969

Then you could race all four:

exten => 33729,1,NoOp(D-E-R-B-Y) 
 same = n,Set(TIMEOUT(D)=1)
 same = n,Set(SPEECH_DTMF_MAXLEN=2)

 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechCreate(my-speech-to-text^nyquist)
 same = n,SpeechCreate(my-speech-to-text^majestic_prince)

 same = n,SpeechBackground(dial-here-often,5,,mage^secretariat^nyquist^majestic_prince)

 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,Set(speechtext0nyquist=${SPEECH_TEXT(0^nyquist)})
 same = n,Set(speechtext0majestic_prince=${SPEECH_TEXT(0^majestic_prince)})

 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)
 same = n,SpeechDestroy(my-speech-to-text^nyquist)
 same = n,SpeechDestroy(my-speech-to-text^majestic_prince)

 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechCreate(my-speech-to-text^nyquist)
 same = n,SpeechCreate(my-speech-to-text^majestic_prince)

 same = n,SpeechBackground(that-tickles,5,p,mage^secretariat^nyquist^majestic_prince)

 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,Set(speechtext0nyquist=${SPEECH_TEXT(0^nyquist)})
 same = n,Set(speechtext0majestic_prince=${SPEECH_TEXT(0^majestic_prince)})

 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)
 same = n,SpeechDestroy(my-speech-to-text^nyquist)
 same = n,SpeechDestroy(my-speech-to-text^majestic_prince)

InterLinked1 commented 3 months ago

There is precedent elsewhere for the Caret separator when things get clever eg. Dial() 'b' and 'B' options.

The ^ and other characters are generally only used for suboptions within an option (or when the , would not work due to special parsing considerations).

Comma is over-used

The comma is the standard argument separator - so that would be expected.

nshmyrev commented 3 months ago

I think its better to control this by the grammar field than to have multiple engine connections.

jcolp commented 3 months ago

@nshmyrev That is what grammar was originally for in older engines so would make sense.

chrsmj commented 3 months ago

Current engine:grammar relationship on a channel is 1:N. This patch improves that to N:N.

jcolp commented 3 months ago

Personally as a user, I would not want that. I would want to just be able to specify multiple grammars and have the engine provide the results back to me ordered in confidence.

chrsmj commented 3 months ago

Agreed, that is nice if the engine ranks your results in order of confidence. But not all of them do (yet) cough vosk cough.

And what if your main cloud engine is lagged out, but your less-preferred local backup engines are still available ?

This patch lets users solve it themselves in the dial plan.

jcolp commented 3 months ago

Okay, I think this really needs a set of non implementation specific requirements and user stories.

chrsmj commented 3 weeks ago

Astricon discussion was great, thanks!

Following the Kentucky Derby, was reminded to clean up the patch a little to fix some memory leaks, but otherwise not a big rewrite to address (potential) concerns of feeding same codec frames to multiple different ASR backends. Although if you can live with that, say, for example, you are only using local Vosk-Kaldi-Docker containers to translate one speaker into multiple languages simultaneously, then this might be a good fit for you.

asterisk / asterisk

[new-feature]: Add horse power in to speech engines #593

Feature Description