adhearsion / punchblock

Telephony middleware library for Ruby
adhearsion.com/punchblock
MIT License
40 stars 34 forks source link

Problems with ASR and the :native_or_unimrcp renderer #223

Open Jared-Prime opened 10 years ago

Jared-Prime commented 10 years ago

Following up on a recent email and submitting an issue as suggested, I'd like to clarify what we need at Ifbyphone.

Preamble

We know we have these options available to us:

Our defaults are a :unimrcp recognizer and a :native_or_unimrcp renderer. We use the :native_or_unimrcp renderer specifically to allow playback of audiofiles natively and TTS over unimrcp since Neospeech doesn't support audio tags.

We don't get an input type that allows us to pass along the appropriate grammar.

Our Approaches

  1. We first tried sending a voice only grammar. The switch defaults to ComposedPrompt. Upon validating the options, we get an error saying voice is not supported by default
  2. Then we tried changing the renderer to :unimrcp. This succeeds but only without audio tags.
  3. Finally, we added to the switch here to allow :native_or_unimrcp to create an MRCPPrompt, and added to the validator to have the renderer accepted. However, it's not a great solution and we'd still miss the fallback functionality.

    TL;DR

The easiest thing for us to do is to simply use a :unimrcp renderer; yet the rub of the issue is that the fix for TTS fallback does not work with ASR since the fix only applies to ComposedPrompt and not also MRCPPrompt.

What do you suggest? We need both the fallback and ASR.

Jared-Prime commented 10 years ago

cc @sfgeorge @runningferret

benlangfeld commented 10 years ago

So, the most obvious thing (renderer = :native_or_unimrcp, recognizer = :unimrcp, which is a fairly precise description of what you want) is about the only thing you didn't list as having tried. Any thoughts on that?

The reason your first option didn't work is because input defaults to the native recognizer, which only handles DTMF. You have to specify that you want to use a recognizer via UniMRCP if you want ASR.

Jared-Prime commented 10 years ago

that throws an error https://github.com/adhearsion/punchblock/blob/develop/lib/punchblock/translator/asterisk/call.rb#L230

Jared-Prime commented 10 years ago

^ edited to point to correct line

benlangfeld commented 10 years ago

So the solution here is to add :native_or_unimrcp as a clause in the inner case statement here to create a ComposedPrompt, similar to your previous PR. I'd be happy to accept a pull request to that effect, while I wasn't happy to make an alias to get to MRCPPrompt, which doesn't provide what you need.

Apologies for this being a round-about review, but it's not been clear up until this explanation what the problem was, and thus the solution was quite confused.

Jared-Prime commented 10 years ago

does a ComposedPrompt give us ASR?

benlangfeld commented 10 years ago

You know, it doesn't... I could have sworn it was implemented as a composition of entirely separate Input and Output components, but it's actually an Input component (DTMF only) with a nested Output.

Thinking it through further, though, UniMRCP doesn't emit a barge event over AMI, making it impossible to construct a prompt component in this way anyway. This is going to require changes to UniMRCP also :(

Jared-Prime commented 10 years ago

Barge events do work with UniMRCP. I think just MRCPPrompt needs to be updated to incorporate the same behavior behind :native_or_unimrcp. I wonder what the cleanest approach is, without having to duplicate code between ComposedPrompt and MRCPPrompt

benlangfeld commented 10 years ago

I know that barge events work, but UniMRCP does not propagate these as AMI events such that we can use them. MRCPPrompt leaves the TTS engine to do all rendering via UniMRCP's SynthAndRecog, and links barge internally. It cannot be adapted to your fallback requirements outside of Asterisk (eg by Punchblock). MRCPNativePrompt is the same, except rendering is done by Asterisk from a list of filenames.

You can forget all varieties of MRCPPrompt if you want your TTS optimisation, unless you want to go and re-implement that in Asterisk C code (which would be a great contribution if it was done flexibly enough, rather than just an MVP for your use).

Jared-Prime commented 10 years ago

Ah, I see what you mean. Ideally we'd have a handler registered on RECOG much like we have on DTMF.

We'll give this some thought and let you know what we come up with in regards to Punchblock. In the meantime, have a nice weekend!

bklang commented 9 years ago

@Jared-Prime Any further thoughts on this? Has the situation with UniMRCP changed at all?

Jared-Prime commented 9 years ago

Yes, I believe Asren's additions to UniMRCP SynthAndRecog should help with this. In my opinion, we can probably close the issue.

/cc @sfgeorge

sfgeorge commented 9 years ago

I agree with @Jared-Prime here, I believe that UniMRCP's SynthAndRecog newly integrated support for multiple TTS and/or audio prompts reduces the need for detailed AMI events during prompt progression.

One caveat: To rely on audio support in SynthAndRecog(), it is necessary for Adhearsion-Asterisk users to check that their audio files exist on-disk before passing them over to UniMRCP.