calejost / unimrcp

Automatically exported from code.google.com/p/unimrcp
Apache License 2.0
0 stars 0 forks source link

ASR does not work with Nuance MRCP V2 Server #32

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Hi Arsen,

I tried running the "unimrcpclient" sample application against a Nuance 
MRCP v2 server (which uses the so called Nuance 9 for ASR), after changing 
the Content-Type for the DEFINE-GRAMMAR request to "application/srgs+xml", 
as I mentioned in an email.

Things seemed to unfold normally until after UniMRCP sent the RECOGNIZE 
request and the server replied with a “200 IN-PROGRESS”.

The first weird thing followed after that. I see that UniMRCP 
opened “one.pcm” only about 4 seconds later. Some RTP data was received by 
the server because Nuance sent a “START-OF-INPUT 2 IN-PROGRESS” 
after “one.pcm” was opened. I also saw a lot of RTP packets in the 
WireShark output.

But 10 seconds later, the server sent a “RECOGNITION-COMPLETE 2 COMPLETE” 
with a Completion-Cause: 015 (“no-match-maxtime”).

Don’t have any clue. The UniMRCP log file, Nuance MRCP server log file, 
and the WireShark capture file will be uploaded in a zip archive 
called “ASR_does_not_work_with_Nuance_MRCP_V2_Server.zip”. Nuance MRCP 
server log file doesn’t mention anything about RTP though, but maybe that 
kind of information was not enabled in the log. I will open a case with 
Nuance support as well. I used the latest UniMRCP release, r930.

Thanks a lot and best regards.

Original issue reported on code.google.com by Curat...@gmail.com on 10 Jun 2009 at 12:20

Attachments:

GoogleCodeExporter commented 8 years ago
Hi Vlad,

>The first weird thing followed after that. I see that UniMRCP opened 
“one.pcm” only
about 4 seconds later.

Looking at the code I cannot imagine how this could happen, unless you put some 
delay
(Sleep) in the code.
Demo recog application creates and sends RECOGNIZE request and initializes input
stream immediately afterward.

> But 10 seconds later, the server sent a “RECOGNITION-COMPLETE 2 COMPLETE” 
with a
Completion-Cause: 015 (“no-match-maxtime”).

Some recognition engines requires about 1 sec gap between actual utterance. I'm
attaching the modified "one.pcm" file, which contains additional 1 sec gap. 
Just give
it a try, it should do the trick.

Original comment by achalo...@gmail.com on 10 Jun 2009 at 6:41

Attachments:

GoogleCodeExporter commented 8 years ago
Hi Arsen,

No, I did not introduce any delay; that would have been wicked :-)...

In fact, until yesterday the only change I had made to the code was the one 
that you 
had indicated in the build instructions which allows the property sheets to 
point to 
the specific locations of the APR and SofiaSIP solutions.

Yesterday, I changed the Content-Type as I mentioned earlier, and this was the 
only 
real change I made.

The behavior is strange. Sometimes that delay does not occur but sometimes it 
does. 
The recognition always fails, most of the time with a Completion-Cause of 
“no-match-
timeout” but a few times I got “Completion-Cause: 002 no-input-timeout” 
which is 
even worse. I used only your new PCM file today, but I didn’t notice any 
difference.

There is another strange thing: I see two INVITE messages being by the MRCP 
client. 
This happens every time. The second INVITE seems to be chopped in the middle of 
the 
SDP body but the server still acknowledges it and sends an OK in response. I 
used 
different versions of the WireShark and I captured the data both on the machine 
running the MRCP client and on the MRCP server to make sure that it is not a 
false 
reporting.

The log file attached contains a few no-input-timeout results but unfortunately 
I do 
not have the WireShark logs for those and I couldn’t reproduce this behavior 
while 
the WireShark was running.

Sorry for the trouble. Nuance tech support ignored my complaint so far :-(.

Best regards.

Original comment by Curat...@gmail.com on 11 Jun 2009 at 3:10

Attachments:

GoogleCodeExporter commented 8 years ago
Hi Vlad,
This looks a bit strange to me, but I'd suggest a few things to try out.

1. Try to specify built-in grammar for digits.
DEFINE-GRAMMAR should contain
Content-Type:text/uri-list
Content-Id: your-content-id
Content-Length: xx

builtin:grammar/digits

2. Start streaming audio, as soon as IN-PROGRESS response to RECOGNIZE request 
is
received. Currently demo recog application start streaming without waiting for 
the
response and it looks not so correct to me now.

3. Finally if neither of above suggestions helps, try to use START-INPUT-TIMERS
method as follows

C->S: RECOGNIZE with "Start-Input-Timers: false" header
S->C: IN-PROGRESS
C->S: START-INPUT-TIMERS
S->C: COMPLETE
C->S: Start streaming

Hope this helps.
BTW, there are a number of users who tried UniMRCP with Nuance since early 
stage of
the development. I did it either a year or so ago, I'm sure eventually it'll 
work for
you too.

Original comment by achalo...@gmail.com on 11 Jun 2009 at 6:19

GoogleCodeExporter commented 8 years ago
Hi Arsen,

Let me try to identify and dissociate the issues. I think there are three 
problems 
here:

1.  A SIP problem: the client sends two INVITE messages in the same session, 
without any good reason. I can see them in the WireShark and in the Nuance 
server 
log file. Can this be a Sofia problem???

2.  Sometimes, but not often, there is a gap in the UniMRCP client log file 
before the file “one.pcm” is opened. Not sure what causes this to happen.

3.  The server cannot recognize the utterance in the allotted time (about ten 
seconds). I can see two potential causes for this: 

-   either the server cannot use the vxml grammar for some obscure reason; 
-   or the RTP data is ill formatted or it is sent to the wrong port or it is 
sent too early or too late.

I used your first suggestion and the result is in the log file posted (same 
outcome: “no-match-timeout”).

I don’t know how to start the RTP transmission exactly after the “200 
IN-PROGRESS” 
was received. I believe that the UniMRCP library decides when to start sending 
RTP, 
probably when you create the “channel” and associate it with a 
“termination”.. But 
the RTP messages do not contain any audio data before IN-PROGRESS was received, 
as 
you may remember. You do this using the Boolean variable 
recog_channel->streaming  
which is set to FALSE until after IN-PROGRESS occurred.

As far as the third suggestion, honestly, I don’t think that it would make 
much 
difference. I believe that the input timers are started correctly when the 
RECOGNIZE 
command is sent. 

In our current application, which uses Nuance 8.5 API (the release prior to the 
introduction of MRCP), we start the equivalent of the “input timers”, when 
the 
prompt finished playing. In the “unimrcpclient” demo, there are no prompts, 
so it is 
appropriate to start the input timers when RECOGNIZE is sent. Besides, the time 
difference between the RECOGNIZE request and IN-PROGRESS is infinitesimal.

We used an audio file which contained five seconds of silence and the server 
correctly returned “no-input-timeout”, in this case. This is encouraging. I 
hope I 
can get to the bottom of this problem somehow.

However, the first two problems (doubled INVITE and the delay before the late 
opening of the audio-file) are beyond me.

I attached a new log file and the two source files which I modified to specify 
the 
built-in grammar as you recommended and to add a few log statements for RTP.

Thanks and best regards.

Original comment by Curat...@gmail.com on 12 Jun 2009 at 3:25

Attachments:

GoogleCodeExporter commented 8 years ago
Hi Vlad,

1. A SIP problem: the client sends two INVITE messages in the same session, 
without
any good reason...
Sorry, but I'm not sure what you meant. According to the demo scenario client 
should
intentionally send two SIP INVITE messages. First for resource allocation and 
the
second one for removal.
C -> S: SIP INVITE or RTPS SETUP   (add recognizer channel)
S -> C: SIP OK or RTPS OK
C -> S: MRCP RECOGNIZE
S -> C: MRCP IN-PROGRESS
C -> S: RTP Start Transmission
S -> C: MRCP START-OF-INPUT
S -> C: MRCP RECOGNITION-COMPLETE
C -> S: RTP Stop Transmission
C -> S: SIP INVITE or RTPS SETUP   (optionally remove recognizer channel)
S -> C: SIP OK or RTPS OK
C -> S: SIP BYE or RTPS TEARDOWN
S -> C: SIP OK or RTPS OK
The second INVITE is optional but intentional for demo.

Initially I assumed it could be just another INVITE message sent as 
retransmission,
while there was no SIP TRYING back, but I double checked the latest Wireshark
captures you sent, there is no such occurrence.

2. Sometimes, but not often, there is a gap in the UniMRCP client log...
I've never observed such a delay on my side, again looking at the code I cannot 
see
any blocking function call. The only thing I suspect now is: fopen() may cause 
such a
delay. See recog_application_on_define_grammar() function.
You may want to add more output around fopen() call to narrow down the problem.

3. The server cannot recognize the utterance in the allotted time
-   either the server cannot use the vxml grammar for some obscure reason;
I'd try to play with grammars more.

-   or the RTP data is ill formatted
No
- or it is sent to the wrong port
No
- or it is sent too early or too late.
I see it's indeed sent as soon as IN-PROGRESS is received and the behavior of 
the
client is correct, but I still suspect the problem may be in timing or grammar.

I've got another good idea. Nuance comes with client test cases. It should be 
in the
directory Nuance installed (NuanceDir\client\scripts).
Try to run the demo.mrcp script instead of UniMRCP client. See how it works. It
should help identify the problem. If I'm not mistaken, it should be possible to 
just
copy NuanceDir\client directory including subdirs to another PC and run it 
there.
Hope this helps

Original comment by achalo...@gmail.com on 12 Jun 2009 at 5:34

GoogleCodeExporter commented 8 years ago
Thanks Arsen,

1. Well, what I meant is that there were two INVITE(s) before BYE :-). I had no 
idea 
that it is standard procedure to do this. My mistake, sorry. It is quite 
reassuring 
if this was done on purpose! I will read the specs more carefully. 

To me, this was the most bizarre behavior. If this is intentional and lawful 
(as the 
Nuance server replies also seem to imply), then no problem! Again, I am sorry 
that I 
forced you to write such a long argumentation (yes, at one point I thought too 
that 
it may be a retry...). I apologize.

2. I will watch this... Let's set it aside for now.

3. Don't spend time on the third problem either. I will try Nuance's own demo 
app, 
if I can find it and try to push their tech support to help me.

Probably, the content of your grammar is fine. I wonder when is the grammar 
analyzed 
by the server: when it was received or only when a recognition operation was 
attempted which uses that grammar. Probably, the latter...

It was a good idea to try the built-in grammar. I think that maybe this 
experiment 
rules out the possibility that the content of your grammar were bad, since the 
built-
in yields the same result... Not sure what can be wrong about the timing of the 
DEFINE-GRAMMAR...

We tried several pcm files. We tried our own "one", recorded by my colleague 
with a 
microphone and got the same no-match-maxtime result. We tried a file containing 
a 
few seconds of silence and got no-input-timeout, as expected.

Finally, I was able to configure the Nuance server to record the utterance 
received, 
I listened to it and it sounded perfect. So, it seems to me that the ball is in 
their court. 

Interesting problem, though...

Have a nice weekend!

Original comment by Curat...@gmail.com on 12 Jun 2009 at 10:08

GoogleCodeExporter commented 8 years ago
1. Well Vlad, I see no problem over here, go ahead!

3. Indeed, rather interesting problem. Anyway, I'd suggest to try their client 
next
and compare what is going on. Please keep me posted.

Have a nice weekend!

Original comment by achalo...@gmail.com on 13 Jun 2009 at 3:03

GoogleCodeExporter commented 8 years ago
1. Well Vlad, I see no problem over here, go ahead!

3. Indeed, rather interesting problem. Anyway, I'd suggest to try their client 
next
and compare what is going on. Please keep me posted.

Have a nice weekend!

Original comment by achalo...@gmail.com on 13 Jun 2009 at 3:18

GoogleCodeExporter commented 8 years ago
The mystery was solved. UniMRCP client demo works great, very good job, Arsen.

The solution to the problem was much simpler than the problem itself. Two 
things 
helped: the Nuance documentation and “Cool Edit”. 

Nuance has a timer called “incompletetimeout” which controls the length of 
the 
period of silence after the callers stopped speaking, to conclude that they had 
finished. The default values is 1.5 seconds. The file “one.pcm” has a 
reasonable 
amount of silence after the end of speech (about one second, or precisely one 
second, I guess). Our own prompts or Nuance’s demo prompt have even less. 
Apparently, if the duration of the silence is shorter than the value 
of “incompletetimeout”, Nuance server believes that the voice was cut short 
by some 
kind of malfunction and declares this to be an error (015 no-match-maxtime). I 
discovered this first by accident after I increased the post-speech silence to 
two 
seconds; then I read the manual and learned about the parameter. The default 
value 
of this parameter can be changed in a whole variety of ways on the server. Here 
is 
how it can be done in the grammar file:

<?xml version="1.0"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" 
version="1.0" 
mode="voice" root="digit">
  <meta name="incompletetimeout" content="1100"/>
  <rule id="digit">
    <one-of>
      <item>one</item>
      <item>two</item>
      <item>three</item>
    </one-of>
  </rule>
</grammar>

Original comment by Curat...@gmail.com on 16 Jun 2009 at 1:57

GoogleCodeExporter commented 8 years ago
Hi there,

I also found this to occur with Nuance Recognizer 9.0.3 with Nuance Media Server
5.1.0. According to Nuance, uniMRCP does not send a "RTCP goodbye" packet to 
indicate
the end of the RTP stream, and their server will not process the audio until it 
is
received and as such, will return with "015 no-match-maxtime".

I think that somehow, we'll need to get uniMRCP to send this packet at the end 
of an
RTP stream, otherwise there will always be this relience on a silence at the 
end of
the audio samples.

Kind regards,
Derik

Original comment by thirion...@gmail.com on 20 Oct 2009 at 8:36