Closed GoogleCodeExporter closed 8 years ago
Hi Vlad,
>The first weird thing followed after that. I see that UniMRCP opened
“one.pcm” only
about 4 seconds later.
Looking at the code I cannot imagine how this could happen, unless you put some
delay
(Sleep) in the code.
Demo recog application creates and sends RECOGNIZE request and initializes input
stream immediately afterward.
> But 10 seconds later, the server sent a “RECOGNITION-COMPLETE 2 COMPLETE”
with a
Completion-Cause: 015 (“no-match-maxtime”).
Some recognition engines requires about 1 sec gap between actual utterance. I'm
attaching the modified "one.pcm" file, which contains additional 1 sec gap.
Just give
it a try, it should do the trick.
Original comment by achalo...@gmail.com
on 10 Jun 2009 at 6:41
Attachments:
Hi Arsen,
No, I did not introduce any delay; that would have been wicked :-)...
In fact, until yesterday the only change I had made to the code was the one
that you
had indicated in the build instructions which allows the property sheets to
point to
the specific locations of the APR and SofiaSIP solutions.
Yesterday, I changed the Content-Type as I mentioned earlier, and this was the
only
real change I made.
The behavior is strange. Sometimes that delay does not occur but sometimes it
does.
The recognition always fails, most of the time with a Completion-Cause of
“no-match-
timeout” but a few times I got “Completion-Cause: 002 no-input-timeout”
which is
even worse. I used only your new PCM file today, but I didn’t notice any
difference.
There is another strange thing: I see two INVITE messages being by the MRCP
client.
This happens every time. The second INVITE seems to be chopped in the middle of
the
SDP body but the server still acknowledges it and sends an OK in response. I
used
different versions of the WireShark and I captured the data both on the machine
running the MRCP client and on the MRCP server to make sure that it is not a
false
reporting.
The log file attached contains a few no-input-timeout results but unfortunately
I do
not have the WireShark logs for those and I couldn’t reproduce this behavior
while
the WireShark was running.
Sorry for the trouble. Nuance tech support ignored my complaint so far :-(.
Best regards.
Original comment by Curat...@gmail.com
on 11 Jun 2009 at 3:10
Attachments:
Hi Vlad,
This looks a bit strange to me, but I'd suggest a few things to try out.
1. Try to specify built-in grammar for digits.
DEFINE-GRAMMAR should contain
Content-Type:text/uri-list
Content-Id: your-content-id
Content-Length: xx
builtin:grammar/digits
2. Start streaming audio, as soon as IN-PROGRESS response to RECOGNIZE request
is
received. Currently demo recog application start streaming without waiting for
the
response and it looks not so correct to me now.
3. Finally if neither of above suggestions helps, try to use START-INPUT-TIMERS
method as follows
C->S: RECOGNIZE with "Start-Input-Timers: false" header
S->C: IN-PROGRESS
C->S: START-INPUT-TIMERS
S->C: COMPLETE
C->S: Start streaming
Hope this helps.
BTW, there are a number of users who tried UniMRCP with Nuance since early
stage of
the development. I did it either a year or so ago, I'm sure eventually it'll
work for
you too.
Original comment by achalo...@gmail.com
on 11 Jun 2009 at 6:19
Hi Arsen,
Let me try to identify and dissociate the issues. I think there are three
problems
here:
1. A SIP problem: the client sends two INVITE messages in the same session,
without any good reason. I can see them in the WireShark and in the Nuance
server
log file. Can this be a Sofia problem???
2. Sometimes, but not often, there is a gap in the UniMRCP client log file
before the file “one.pcm” is opened. Not sure what causes this to happen.
3. The server cannot recognize the utterance in the allotted time (about ten
seconds). I can see two potential causes for this:
- either the server cannot use the vxml grammar for some obscure reason;
- or the RTP data is ill formatted or it is sent to the wrong port or it is
sent too early or too late.
I used your first suggestion and the result is in the log file posted (same
outcome: “no-match-timeout”).
I don’t know how to start the RTP transmission exactly after the “200
IN-PROGRESS”
was received. I believe that the UniMRCP library decides when to start sending
RTP,
probably when you create the “channel” and associate it with a
“termination”.. But
the RTP messages do not contain any audio data before IN-PROGRESS was received,
as
you may remember. You do this using the Boolean variable
recog_channel->streaming
which is set to FALSE until after IN-PROGRESS occurred.
As far as the third suggestion, honestly, I don’t think that it would make
much
difference. I believe that the input timers are started correctly when the
RECOGNIZE
command is sent.
In our current application, which uses Nuance 8.5 API (the release prior to the
introduction of MRCP), we start the equivalent of the “input timers”, when
the
prompt finished playing. In the “unimrcpclient” demo, there are no prompts,
so it is
appropriate to start the input timers when RECOGNIZE is sent. Besides, the time
difference between the RECOGNIZE request and IN-PROGRESS is infinitesimal.
We used an audio file which contained five seconds of silence and the server
correctly returned “no-input-timeout”, in this case. This is encouraging. I
hope I
can get to the bottom of this problem somehow.
However, the first two problems (doubled INVITE and the delay before the late
opening of the audio-file) are beyond me.
I attached a new log file and the two source files which I modified to specify
the
built-in grammar as you recommended and to add a few log statements for RTP.
Thanks and best regards.
Original comment by Curat...@gmail.com
on 12 Jun 2009 at 3:25
Attachments:
Hi Vlad,
1. A SIP problem: the client sends two INVITE messages in the same session,
without
any good reason...
Sorry, but I'm not sure what you meant. According to the demo scenario client
should
intentionally send two SIP INVITE messages. First for resource allocation and
the
second one for removal.
C -> S: SIP INVITE or RTPS SETUP (add recognizer channel)
S -> C: SIP OK or RTPS OK
C -> S: MRCP RECOGNIZE
S -> C: MRCP IN-PROGRESS
C -> S: RTP Start Transmission
S -> C: MRCP START-OF-INPUT
S -> C: MRCP RECOGNITION-COMPLETE
C -> S: RTP Stop Transmission
C -> S: SIP INVITE or RTPS SETUP (optionally remove recognizer channel)
S -> C: SIP OK or RTPS OK
C -> S: SIP BYE or RTPS TEARDOWN
S -> C: SIP OK or RTPS OK
The second INVITE is optional but intentional for demo.
Initially I assumed it could be just another INVITE message sent as
retransmission,
while there was no SIP TRYING back, but I double checked the latest Wireshark
captures you sent, there is no such occurrence.
2. Sometimes, but not often, there is a gap in the UniMRCP client log...
I've never observed such a delay on my side, again looking at the code I cannot
see
any blocking function call. The only thing I suspect now is: fopen() may cause
such a
delay. See recog_application_on_define_grammar() function.
You may want to add more output around fopen() call to narrow down the problem.
3. The server cannot recognize the utterance in the allotted time
- either the server cannot use the vxml grammar for some obscure reason;
I'd try to play with grammars more.
- or the RTP data is ill formatted
No
- or it is sent to the wrong port
No
- or it is sent too early or too late.
I see it's indeed sent as soon as IN-PROGRESS is received and the behavior of
the
client is correct, but I still suspect the problem may be in timing or grammar.
I've got another good idea. Nuance comes with client test cases. It should be
in the
directory Nuance installed (NuanceDir\client\scripts).
Try to run the demo.mrcp script instead of UniMRCP client. See how it works. It
should help identify the problem. If I'm not mistaken, it should be possible to
just
copy NuanceDir\client directory including subdirs to another PC and run it
there.
Hope this helps
Original comment by achalo...@gmail.com
on 12 Jun 2009 at 5:34
Thanks Arsen,
1. Well, what I meant is that there were two INVITE(s) before BYE :-). I had no
idea
that it is standard procedure to do this. My mistake, sorry. It is quite
reassuring
if this was done on purpose! I will read the specs more carefully.
To me, this was the most bizarre behavior. If this is intentional and lawful
(as the
Nuance server replies also seem to imply), then no problem! Again, I am sorry
that I
forced you to write such a long argumentation (yes, at one point I thought too
that
it may be a retry...). I apologize.
2. I will watch this... Let's set it aside for now.
3. Don't spend time on the third problem either. I will try Nuance's own demo
app,
if I can find it and try to push their tech support to help me.
Probably, the content of your grammar is fine. I wonder when is the grammar
analyzed
by the server: when it was received or only when a recognition operation was
attempted which uses that grammar. Probably, the latter...
It was a good idea to try the built-in grammar. I think that maybe this
experiment
rules out the possibility that the content of your grammar were bad, since the
built-
in yields the same result... Not sure what can be wrong about the timing of the
DEFINE-GRAMMAR...
We tried several pcm files. We tried our own "one", recorded by my colleague
with a
microphone and got the same no-match-maxtime result. We tried a file containing
a
few seconds of silence and got no-input-timeout, as expected.
Finally, I was able to configure the Nuance server to record the utterance
received,
I listened to it and it sounded perfect. So, it seems to me that the ball is in
their court.
Interesting problem, though...
Have a nice weekend!
Original comment by Curat...@gmail.com
on 12 Jun 2009 at 10:08
1. Well Vlad, I see no problem over here, go ahead!
3. Indeed, rather interesting problem. Anyway, I'd suggest to try their client
next
and compare what is going on. Please keep me posted.
Have a nice weekend!
Original comment by achalo...@gmail.com
on 13 Jun 2009 at 3:03
1. Well Vlad, I see no problem over here, go ahead!
3. Indeed, rather interesting problem. Anyway, I'd suggest to try their client
next
and compare what is going on. Please keep me posted.
Have a nice weekend!
Original comment by achalo...@gmail.com
on 13 Jun 2009 at 3:18
The mystery was solved. UniMRCP client demo works great, very good job, Arsen.
The solution to the problem was much simpler than the problem itself. Two
things
helped: the Nuance documentation and “Cool Edit”.
Nuance has a timer called “incompletetimeout” which controls the length of
the
period of silence after the callers stopped speaking, to conclude that they had
finished. The default values is 1.5 seconds. The file “one.pcm” has a
reasonable
amount of silence after the end of speech (about one second, or precisely one
second, I guess). Our own prompts or Nuance’s demo prompt have even less.
Apparently, if the duration of the silence is shorter than the value
of “incompletetimeout”, Nuance server believes that the voice was cut short
by some
kind of malfunction and declares this to be an error (015 no-match-maxtime). I
discovered this first by accident after I increased the post-speech silence to
two
seconds; then I read the manual and learned about the parameter. The default
value
of this parameter can be changed in a whole variety of ways on the server. Here
is
how it can be done in the grammar file:
<?xml version="1.0"?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US"
version="1.0"
mode="voice" root="digit">
<meta name="incompletetimeout" content="1100"/>
<rule id="digit">
<one-of>
<item>one</item>
<item>two</item>
<item>three</item>
</one-of>
</rule>
</grammar>
Original comment by Curat...@gmail.com
on 16 Jun 2009 at 1:57
Hi there,
I also found this to occur with Nuance Recognizer 9.0.3 with Nuance Media Server
5.1.0. According to Nuance, uniMRCP does not send a "RTCP goodbye" packet to
indicate
the end of the RTP stream, and their server will not process the audio until it
is
received and as such, will return with "015 no-match-maxtime".
I think that somehow, we'll need to get uniMRCP to send this packet at the end
of an
RTP stream, otherwise there will always be this relience on a silence at the
end of
the audio samples.
Kind regards,
Derik
Original comment by thirion...@gmail.com
on 20 Oct 2009 at 8:36
Original issue reported on code.google.com by
Curat...@gmail.com
on 10 Jun 2009 at 12:20Attachments: