Closed GoogleCodeExporter closed 9 years ago
The release should be 0.8.0. In fact I downloaded the latest trunk version a few
weeks after 0.8.0 was released...
Original comment by Curat...@gmail.com
on 26 Dec 2009 at 5:18
SVN revision is 1345.
Original comment by Curat...@gmail.com
on 26 Dec 2009 at 5:22
Thanks for raising this up, Vlad.
First of all, there is a quick workaround, which you may want to temporary use
libs/mrcpv2-transport/include/mrcp_connection.h
-#define MRCP_STREAM_BUFFER_SIZE 1024
+#define MRCP_STREAM_BUFFER_SIZE 4096
Now lets better understand what is going on and where the actual issue is.
UniMRCP does support message segmentation (TCP stream) and it seamlessly
processes
cases, when segmentation occurs on message body. However, when segmentation
occurs on
message start-line or header fields, the routine stores processed (segmented)
buffer,
reads the remaining stream and starts message parsing over. This works
reliable, when
the length of message start-line and header fields is less than 1024 bytes. I
thought
that 1024 bytes should be more than enough, but I didn't expect there can be so
many
vendor specific params ...
Certainly I'm going to support this case too. The routine should be able to
continue
parsing header fields, but not start over, if segmentation occurs on them. This
would
be more effective implementation compared to the current one and will not be
limited
with either 1024 or 4096 or anything else. Everything is clear, feasible and
reliable. However, another issue is upcoming release. This release already
introduces
so many changes, enhancements, additions, so I'd prefer not to make last minute
changes especially in the core components such as message parsing, etc, unless
you
think this is an urgent issue.
Original comment by achalo...@gmail.com
on 26 Dec 2009 at 8:48
Original comment by achalo...@gmail.com
on 26 Dec 2009 at 8:49
Hi Arsen, thanks for the tip; obviously it works fine with
MRCP_STREAM_BUFFER_SIZE
set to a higher value.
The issue is not at all urgent. Could you please though explain or remind me
what
MRCP_MESSAGE_TYPE_REQUEST is used for and in what circumstances? This is more
important for me at this stage.
As far as the message segmentation, yes I am afraid that you cannot avoid
taking it
into consideration. This is why they give you the message length on the event
line.
For my curiosity, what messages are you referring to when you say that "UniMRCP
does
support message segmentation (TCP stream) and it seamlessly processes
cases, when segmentation occurs on message body"? Do you think of MRCP v1? I
believe
that in MRCP v2 you do not have to deal directly with SIP messages since Sofia
does
that; as far as MRCP v2 events, they always have a "length" field on the
event-line,
the only problem being to extract that "length" and make sure that you read
precisely
"length" bytes from the TCP socket.
Cheers.
Original comment by Curat...@gmail.com
on 27 Dec 2009 at 1:14
Hi Vlad,
Yes, it should be working fine with a buffer bigger than length of message
headers.
I think you shouldn't worry much about MRCP_MESSAGE_TYPE_REQUEST issue, as it's
just
a side effect. Normally you should receive only MRCP responses and events, but
not
requests.
The same technique is used for parsing of RTSP/MRCPv1 and MRCPv2 messages. Thus
I
meant both of them and yes SIP messages are parsed by Sofia.
BTW, the issue is I don't allocate buffer for the whole message (start-line +
headers
+ body), instead I read some portion (segment) to the buffer and parse an
MRCPv2 or
RTSP message on the fly. That buffer is limited (1024, 4096 or whatever). So the
case, when the length of headers is bigger than the length of the buffer, isn't
supported now. It's not a typical TCP stream segmentation, but the issue is
related
to the same routine.
So the routine should be capable to parse the header fields available in the
read
buffer and continue (but not start over) with the next read.
Original comment by achalo...@gmail.com
on 27 Dec 2009 at 6:48
If I quadruple the number of channels which start v2 sessions relatively
simultaneously, the value 4096 for the MRCP_STREAM_BUFFER_SIZE seems to become
insufficient. This surprised me, because I thought that a buffer of this size
was
allocated to each channel, so increasing the number of channels shouldn't have
mattered.
If I quadruple MRCP_STREAM_BUFFER_SIZE, then four channels apparently work.
Original comment by Curat...@gmail.com
on 29 Dec 2009 at 3:50
Attachments:
Original comment by achalo...@gmail.com
on 6 Jan 2010 at 9:36
Hi Vlad,
This issue has been fixed in trunk (r1388). Now it's working fine with and
without
Vendor-Specific-Parameters whatever size of the headers or the whole message is.
Also the routine is more tolerant against TCP segmentation.
Please give it a try and provide feedback when possible.
Original comment by achalo...@gmail.com
on 8 Jan 2010 at 2:55
Hi Arsen,
I tried it and it is doing some weird things. Sorry, buddy :-(.
Firstly, in the log file the message is broken into pieces separated by other
log
statements which makes the the process of reading it very hard. For instance:
UIS1TN3 0901 17:30:11.032 Receive MRCPv2 Stream
10.10.10.65:2187 <->
10.10.10.69:6075 [1023 bytes]
MRCP/2.0 3221 9 200 COMPLETE
Channel-Identifier: 14@speechrecog
Logging-Tag:
Fetch-Timeout: 10000
Vendor-Specific-Parameters:
server.osrspeechrecog.result.mediatype=application/x-vnd.speechworks.emma+xml;st
rictconfidencelevel=1;mrcpv=2.06
Vendor-Specific-Parameters: confidencelevel=500
Vendor-Specific-Parameters: sensitivity=0.500
Vendor-Specific-Parameters: speedvsaccuracy=0.789
Vendor-Specific-Parameters: timeout=1000
Vendor-Specific-Parameters: completetimeout=2222
Vendor-Specific-Parameters: incompletetimeout=3333
Vendor-Specific-Parameters: swirec_nbest_list_length=5
Vendor-Specific-Parameters: swirec_max_cpu_time=20000
Vendor-Specific-Parameters: server.osrspeechrecog.result.sendnomatch=false
Vendor-Specific-Parameters: swirec_tenant_name=Nuance
Vendor-Specific-Parameters: swirec_company_name=Nuance
Vendor-Specific-Parameters: swirec_application_name=UniMRCP
Vendor-Specific-Parameters: maxspeechtimeout=-1
Vendor-Specific-Parameters: swirec_barge_in_mode=normal
Vendor-Specific-Parame [mrcp_client_connection:525]
UIS1TN3 0901 17:30:11.032 Truncated MRCP Message [1001]
[mrcp_stream:388]
UIS1TN3 0901 17:30:11.032 Scroll MRCP Stream [mrcp_stream:391]
UIS1TN3 0901 17:30:11.032 Receive MRCPv2 Stream
10.10.10.65:2187 <->
10.10.10.69:6075 [1001 bytes]
ters: swirec_load_adjusted_speedvsaccuracy=on
In this case, "Vendor-Specific-Parameters:
swirec_load_adjusted_speedvsaccuracy=on"
was logged in two separate statements: "Vendor-Specific-Parame" and "ters:
swirec_load_adjusted_speedvsaccuracy=on".
Secondly, the parsing seems screwed-up. And there are <CR><LF> pairs inserted
spuriously in the message. I use the same code which worked fine with r1380 and
previous, after I increased MRCP_STREAM_BUFFER_SIZE. I will post my log file
and I
will revert to r1380, if SVN allows me to do so.
Perhaps I do not understand exactly what you are doing, but here is the way I
would
handle the response messages:
1. I would first read a few bytes just to make sure that I captured the
"message-length" of the "response-line" (draft-ietf-speechsc-mrcpv2-20, 5.3).
Let's
say that I read "n" bytes in this first batch.
2. I would extract the length of the entire message from the response-line and I
would read another batch of precisely "message-length - n" bytes. I believe that
something like this would allow me to read all the bytes of the current
response; no
more and no less.
3. I would do the parsing only after I read the entire message.
This is pretty much how we do it in our applications. It allows us to get an
entire
TCP/IP message using two calls to recv(), at the most.
As far as the buffer required to read the message, a combination of an initial
"malloc()" perhaps followed by a "realloc()" (if the current message is larger
than
the previous one) should achieve the purpose without consuming too many
resources.
That small 1KB buffer where you want to read everything seems inefficient to me
because it may require several calls to "recv()". The parsing "on-the-fly" is
technically correct but probably requires careful programming and should not
even be
required if an entire message is read in two shots. Just my guess...
Thanks and best regards.
Original comment by Curat...@gmail.com
on 10 Jan 2010 at 3:40
Attachments:
Hi Vlad,
The approach you suggested is very clear to me. Conceptually everything is
right and
I agree it should be quite straightforward to implement this approach.
The only reason I didn't go this way is malloc() and possible realloc() calls.
The method I use such as "on-the-fly" parsing of stream allows not to use
dynamic
allocation of memory. Yes, this method requires careful programming, but at
least it
works or should work reliable and efficient enough, I think.
Probably it's reasonable to make the size of that buffer configurable, so it'll
be
possible to optimize the behavior, though 1Kb should be OK by default.
Thanks for the feedback.
Original comment by achalo...@gmail.com
on 10 Jan 2010 at 8:11
Thanks, Arsen. Setting the theory aside, maybe you can take a look and
determine why
my test fails. Is there anything I need to do differently? I reloaded r1380 and
the
vendor specific parameters look correct. I will upload the log file.
Here is the code I use (I did not omit the statements which do not pertain to
UniMRCP
because they are simple and self-explanatory).
void RetrieveVendorSpecificHeaderValues( int line, HeadersResult* presult,
mrcp_message_t* mrcp_message )
{
mrcp_generic_header_t* generic_header;
int idx;
const apt_pair_t* p_apt_str_pair;
string name;
string value;
generic_header = (mrcp_generic_header_t*)mrcp_generic_header_get( mrcp_message );
if( !generic_header )
{
LOGE( line, "RetrieveGenericHeaderValue(): mrcp_generic_header_get() returned NULL" );
presult->SetHeaderAndValue( "Vendor-Specific-Parameters", "" );
return;
}
for( idx = 0; idx < apt_pair_array_size_get(
generic_header->vendor_specific_params ); idx++ )
{
p_apt_str_pair = apt_pair_array_get( generic_header->vendor_specific_params, idx );
LOGD( line, "Vendor-Specific-Parameters: name: \"%.*s\", value: \"%.*s\""
, p_apt_str_pair->name.length
, p_apt_str_pair->name.buf
, p_apt_str_pair->value.length
, p_apt_str_pair->value.buf );
name.assign( p_apt_str_pair->name.buf, p_apt_str_pair->name.length );
value.assign( p_apt_str_pair->value.buf, p_apt_str_pair->value.length );
name += "=";
name += value;
presult->SetHeaderAndValue( "Vendor-Specific-Parameters", name );
}
}
Of course, this is not urgent.
Thanks a lot and best regards!
Original comment by Curat...@gmail.com
on 11 Jan 2010 at 1:20
Attachments:
Sorry, Vlad.
I thought you just had some doubts over the implementation, but as it was
turned at
the routine even didn't work :(
Actually, parsing of name/value pairs (vendor-specific-parameters) was not
coherent
with parsing of other header fields. I'd say it worked with r1380 and earlier
versions only by chance, as the parsed result (name/value pairs) pointed to the
read
buffer, which might be overridden at any time, with the next socket_recv()
attempt.
Hopefully finally fixed in r1394.
Thanks again.
Original comment by achalo...@gmail.com
on 11 Jan 2010 at 2:46
Hi Arsen,
The parsing seems correct now in r1394. But I have other troubles and I could
test it
just once. Will let you know when I learn more. There is one thing that I would
suggest if you don't mind. The message segmentation creates log statements
which may
break the parameter response received from the server into arbitrary pieces.
E.g.:
Vendor-Specific-Parame [mrcp_client_connection:525]
UIS1TN3 1101 19:26:00.133 Truncated MRCP Message [1001]
[mrcp_stream:388]
UIS1TN3 1101 19:26:00.133 Scroll MRCP Stream [mrcp_stream:391]
UIS1TN3 1101 19:26:00.133 Receive MRCPv2 Stream
10.10.10.65:4604 <->
10.10.10.69:6075 [1001 bytes]
ters: swirec_load_adjusted_speedvsaccuracy=on
Vendor-Specific-Parameters: swirec_magic_word_conf_thresh=500
Vendor-Specific-Parameters: swirec_busy_cpu_start=40
Vendor-Specific-Parameters: swirec_acoustic_adapt_suppress_adaptation=0
Vendor-Specific-Parameters: swirec_audio_environment=default
Vendor-Specific-Parameters: swirec_backward_compatible_confidence_scores=2.0
Vendor-Specific-Parameters: swirec_extra_nbest_keys=SWI_meaning SWI_literal
SWI_grammarName
Vendor-Specific-Parameters: swirec_max_logged_nbest=2
Vendor-Specific-Parameters: swirec_max_search_time=5000
Vendor-Specific-Parameters: swirec_max_sentences_tried=999999
Vendor-Specific-Parameters: swirec_normal_cpu_start=15
Vendor-Specific-Parameters: swirec_pegged_cpu_start=101
Vendor-Specific-Parameters: swirec_return_waveform=1
Vendor-Specific-Parameters: swirec_selective_barge_in_conf_thresh=500
Vendor-Specific-Parameters: swirec_suppress_event_logging=0
Vendor-Specific-Parameters: swirec_suppress_waveform_logg
[mrcp_client_connection:525]
Two log statemets can be removed:
Truncated MRCP Message [1001] [mrcp_stream:388]
and
Scroll MRCP Stream [mrcp_stream:391]
(or enabled only in "debug" mode). But the third one is exactly the one which
logs
the parameters so it can be removed only to the expense of not logging the
parameters
at all.
If you are totally opposed to the malloc/realloc, probably you could consider
making
the buffer configurable or at least much larger than 1KB. In our program we use
a few
hundreds of 16KB buffers without any significant impact on the run-time memory
size.
Thanks a lot!
Original comment by Curat...@gmail.com
on 12 Jan 2010 at 5:02
Well, Vlad.
I'll make that buffer configurable during the days. So you'll be able to set it
as
large as needed.
I agree it's not an issue for any recent hardware, as there are several
gigabytes of
memory on them, but project still can be used on embedded devices, where dynamic
allocation of memory and memory itself might cause issues.
Original comment by achalo...@gmail.com
on 12 Jan 2010 at 12:08
Original issue reported on code.google.com by
Curat...@gmail.com
on 26 Dec 2009 at 5:15Attachments: