AaronZhangL / unimrcp

Automatically exported from code.google.com/p/unimrcp
Apache License 2.0
0 stars 0 forks source link

Memory corruption in mpf code #29

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

I'm using libunimrcpclient linked to my application. I use it as a
MRCPv2-only client connected to a Loquendo Speech Server 7.x. I'm doing
stress tests in my application. These stress tests simply consist in
issuing a bunch of TTS requests and a bunch of ASR requests. I have only
one application registered with unimrcp requesting ASR and TTS resources to
do the corresponding requests. Each TTS request is done in its own session.
Each ASR request is done in its own session. Each request is done in its
own MRCP connection.

Arsen, maybe the sample code you made that allowed you to reproduce the
error associated with Issue 27, can serve also to identify this one.

What version of the product are you using? On what operating system?

I use the latest (trunk) libunimrcpclient version compiled by me.

The OS is a Windows Server 2003 SP2 Enterprise Edition.

Please provide any additional information below.

NOTE: I detached this info from Issue 27, because this may be another issue
independent from the one reported at 27.

I attached the traces and also a visual studio 2005 screenshot to
illustrate the point of crash along with the stack contents for each call
from the mpf mm_timer thread context. If you need anything from my side to
help fixing this issue just ask. Thanks in advance.

Original issue reported on code.google.com by cpsoa...@gmail.com on 1 Jun 2009 at 3:46

Attachments:

GoogleCodeExporter commented 9 years ago
cpsoares,
I know almost nothing about you, but I see you know what are doing and I like 
the way
you treat and report the issues.
I assume this should be another issue, as I easily reproduced the issue-27, 
clearly
saw how RTP payload overrode channel identifier and fixed that issue.
Unfortunately I've not managed to reproduce this one so far using the same 
framework.
However I'm testing against UniMRCP server, and I just guess the problem may 
occur
testing against Loquendo instead. Anyway, I see nothing extraordinary from the
message flow and logs.
Can you please test using only TTS or only ASR resource just to narrow down the 
issue?

Original comment by achalo...@gmail.com on 1 Jun 2009 at 4:38

GoogleCodeExporter commented 9 years ago
  Hi Arsen,

  After an almost 5 hour "run", I could reproduce the problem using only ASR
requests. The crash is always on the same thread context, ie, mpf mm_timer 
thread,
internal to UNIMRCP.
  I send the VS2005 crash call stack screenshot along with the last trace file I got
from the MRCP side (if you need all the other trace files just ask, please). 
Near the
end you can see in the line that reads "5500 - Add Control Channel" the resource
identifier is garbled. Thank you for your time.

Original comment by cpsoa...@gmail.com on 1 Jun 2009 at 11:17

Attachments:

GoogleCodeExporter commented 9 years ago
Hi cpsoares,
Seems I'm stuck with this issue at the moment. Today is the second day I have 
been
running the same stress/load test without "luck".
However I clearly see memory corruption happened on your end (vs2005 call stack 
and
channel identifier in logs).
For now I'd suggest to play a bit with MPF integration on your side.
At first, do nothing in recog_app_stream_read callback, frame->type will remain
MEDIA_FRAME_TYPE_NONE, and it will prevent sending any RTP packets to Loquendo, 
but
hopefully will help narrow down the issue.
Next, if it never crashes, try to simply fill codec buffer with 0
frame->type |= MEDIA_FRAME_TYPE_AUDIO;
memset(frame->codec_frame.buffer,0,frame->codec_frame.size);
and run the same test again
Please let me know about the results.

Original comment by achalo...@gmail.com on 3 Jun 2009 at 2:13

GoogleCodeExporter commented 9 years ago
  Hi Arsen,

  Before I try with the modifications you suggested, I broke my application in two
parts to simplify and used only the media part (where the MRCP connector is 
standing)
to the test. What I do is make several TTS and ASR mixed requests. I put the
application running into purify. It stays running for some minutes, then it 
crashes
with an IPR (Invalid Pointer Read) and then with an unhandled exception from 
that IPR.
  The detailed description from purify for te IPR is in attachment and also are the
traces. In the traces, at the end of the "run" (timestamp 16:18:40.296) the 
"3852 -
Add Control Channel <000073A04A27F399Xspeechsynth> 0BA71508I <-> 0BA71470I [1]" 
line
has also the resource ID garbled.
  Thanks for your time. Meanwhile I'll try to reproduce the crash in the conditions
you suggest in your last message.

  Regards.

Original comment by cpsoa...@gmail.com on 4 Jun 2009 at 4:43

Attachments:

GoogleCodeExporter commented 9 years ago
  Arsen,

  As per your suggestion, I tried doing nothing in recog_app_stream_read callback.
The result was I could not crash the application. As for the other scenario 
feeding
audio filled with zeros as samples, the application crashed after some time. So 
this
clearly points to some issue related to the management of audio buffers in 
UniMRCP
that is causing mem overruns and thus corruption. I think situations like this 
are
very difficult to spot event in purify (where you only get a IPR), the 
situation gets
worse when using apr and memory pools, because the memory is allocated in 
chunks, if
you overrun one buffer and still you are inside the mem chunk, purify will not 
complain.

  Regards.

Original comment by cpsoa...@gmail.com on 5 Jun 2009 at 1:24

GoogleCodeExporter commented 9 years ago
Thanks for the detailed and complete information you continuously provide.

Yes, it's clearly something wrong with audio buffers. Tools such as valgring and
purify are not so helpful in case of memory pool allocation.
From other hand, if application crashes earlier running under purify, it smells 
like
typical race condition issue. I'll try to access purify soon to see if it helps
replicate the problem on my side. Today I've ran more than 1 million ASR and TTS
sessions and this is not the way to go.

Original comment by achalo...@gmail.com on 5 Jun 2009 at 2:50

GoogleCodeExporter commented 9 years ago
  Arsen,

  Do you know an easy way to substitute de pool allocs by malloc of the exact size as
requested? I would not bother for now about the frees as memory can grow a few 
MEGS
for this test before we can spot where the problem resides. If we had that I 
could
put the program easily running on purify to try to spot first the invalid memory
write. This could lead us to very important and useful clues about this problem.

  Regards.

Original comment by cpsoa...@gmail.com on 8 Jun 2009 at 2:29

GoogleCodeExporter commented 9 years ago
cpsoares,
I ran the same stress test under purify and got yet another issue, which caused
client stack to crash. I've just fixed it r967. However I assume this is not the
issue you observed and I'm still trying to get it on my side.
Anyway, I'm planning to make a few changes concerning memory pools and 
allocators.
Currently there is no way to specify memory allocation policy in unimrcp.
More to come...

I use Win XP 32bit, I guess you use Win 2003 32bit. Please confirm it's not 
64bit OS.

Original comment by achalo...@gmail.com on 8 Jun 2009 at 4:40

GoogleCodeExporter commented 9 years ago
Yes 32 bit OS.

Original comment by cpsoa...@gmail.com on 8 Jun 2009 at 5:18

GoogleCodeExporter commented 9 years ago
I remembered other thing related to the scenario where we don't deliver any 
sound
frames to unimrcp - The problem may not be 100% related to streaming, as in this
particular scenario, the recognizer also responds with error messages which are
significantly smaller than the usual result messages with all interpretations 
and
confidences. So, the problem may be anywhere :-D.

Regards.

Original comment by cpsoa...@gmail.com on 8 Jun 2009 at 5:30

GoogleCodeExporter commented 9 years ago
Seems I know where the problem is in. I cannot say I'm happy with what I 
found...
I changed a few things in the code and now client crashes in a minute under the 
same
stress test, which used to run for ages before, strange.
More to come...

Original comment by achalo...@gmail.com on 9 Jun 2009 at 1:48

GoogleCodeExporter commented 9 years ago
Several issues, which may cause client stack to crash have been fixed recently 
in trunk.
For now I'd suggest you to upgrade to recent trunk (r984) and re-test.
Also try to play with #define OWN_ALLOCATOR_PER_POOL option introduced in 
apt_pool.c
Currently it's disabled (commented), but soon I'll enable it by default.
This helps purify and valgrind be more sensitive to memory allocation problems.
By default, APR pools use one common allocator, which actually hides many 
problems
and looks not so convenient to me.

Original comment by achalo...@gmail.com on 15 Jun 2009 at 7:32

GoogleCodeExporter commented 9 years ago
Hi,

I'm going to test the trunk version now. I'll then report the results.

Thanks

Original comment by cpsoa...@gmail.com on 22 Jun 2009 at 5:18

GoogleCodeExporter commented 9 years ago
  Hi,

  Still the memory corruption is there somehow. The sign is as soon as the traces
show a garbled resource identifier the program crashes a few seconds after.
  I will try to do some code review to understand the whole unimrcp (client side)
internals, but this will take me a huge time :-D.
  Next is a piece of the MRCP trace. I don't know if this helps but one common thing
about this memory overrun is that the only place when the garbled resource
identifiers appear is on "Add Control Channel" lines on all stress runs I did.
  Regards.

(---------------) 968 - Receive Signaling Task Message [0]
(---------------) 968 - Receive Answer 0x17fb8050 <new> [c:1 a:1 v:0]
(---------------) 968 - Modify Control Channel 0x17fb8050 <00004F844A41FEC8>
(---------------) 968 - Modify Termination 0x17fb8050 <00004F844A41FEC8>
(---------------) 968 - Receive Signaling Task Message [0]
(---------------) 968 - Receive Answer 0x46d2558 <new> [c:1 a:1 v:0]
(---------------) 968 - Modify Control Channel 0x46d2558 <0000680C4A41FEC8>
(---------------) 968 - Modify Termination 0x46d2558 <0000680C4A41FEC8>
(---------------) 2252 - Signal Signaling Task Message
(---------------) 1364 - Process Control Message
(---------------) 968 - Receive Signaling Task Message [0]
(---------------) 2252 - Receive SIP Event [nua_i_active] Status 200 Call active
(---------------) 968 - Receive Answer 0x472d550 <new> [c:1 a:1 v:0]
(---------------) 968 - Modify Control Channel 0x472d550 <00004AA04A41FEC8>
(---------------) 968 - Modify Termination 0x472d550 <00004AA04A41FEC8>
(---------------) 3120 - Process MPF Message
(---------------) 3120 - Create Audio Bridge
(---------------) 1364 - Established TCP/MRCPv2 Connection 10.2.200.100:1209 <->
10.10.0.101:51030
(---------------) 3120 - Open RTP Transmit 10.2.200.100:59320 -> 
10.10.0.101:10046
(---------------) 1364 - Add Control Channel <00004F844A41FEC8@speechrecog>
10.2.200.100:1209 <-> 10.10.0.101:51030 [1]
(---------------) 1364 - Signal Connection Task Message
(---------------) 3120 - Process MPF Message
(---------------) 3120 - Create Audio Bridge
(---------------) 3120 - Open RTP Receive 10.2.200.100:59322 <- 
10.10.0.101:10048
playout [50 ms]
(---------------) 3120 - Process MPF Message
(---------------) 3120 - Create Audio Bridge
(---------------) 3120 - Open RTP Receive 10.2.200.100:59324 <- 
10.10.0.101:10050
playout [50 ms]
(---------------) 968 - Receive Media Task Message [1]
(---------------) 968 - On Termination Modify 0x17fb8050 <00004F844A41FEC8>
(---------------) 968 - Receive Connection Task Message [1]
(---------------) 968 - On Control Channel Modify 0x17fb8050 <00004F844A41FEC8>
(---------------) 968 - Raise App Response 0x17fb8050 <00004F844A41FEC8> [2] 
SUCCESS [0]
(---------------) 968 - Receive Media Task Message [1]
(---------------) 968 - On Termination Modify 0x46d2558 <0000680C4A41FEC8>
(---------------) 968 - Receive Media Task Message [1]
(---------------) 968 - On Termination Modify 0x472d550 <00004AA04A41FEC8>
(---------------) 1364 - Established TCP/MRCPv2 Connection 10.2.200.100:1210 <->
10.10.0.101:51008
(---------------) 1364 - Add Control Channel <0000ð4A41@'C8spe>
10.2.200.100:1210 <-> 10.10.0.101:51008 [1]
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [1]
(---------------) 968 - On Control Channel Modify 0x46d2558 <0000680C4A41FEC8>
(---------------) 968 - Raise App Response 0x46d2558 <0000680C4A41FEC8> [2] 
SUCCESS [0]
(---------------) 1344 - Signal Application Task Message
(---------------) 968 - Receive Application Task Message [1]
(---------------) 968 - Receive App MRCP Request 0x17fb8050 <00004F844A41FEC8>
(---------------) 968 - Send MRCP Request 0x17fb8050 
<00004F844A41FEC8@speechrecog> [1]
(---------------) 1344 - Signal Application Task Message
(---------------) 968 - Receive Application Task Message [1]
(---------------) 968 - Receive App MRCP Request 0x46d2558 <0000680C4A41FEC8>
(---------------) 968 - Send MRCP Request 0x46d2558 
<0000680C4A41FEC8@speechsynth> [1]
(---------------) 1364 - Established TCP/MRCPv2 Connection 10.2.200.100:1211 <->
10.10.0.101:51035
(---------------) 1364 - Add Control Channel <00004AA04A41FEC8@speechsynth>
10.2.200.100:1211 <-> 10.10.0.101:51035 [1]
(---------------) 1364 - Signal Connection Task Message
(---------------) 1364 - Send MRCPv2 Stream 10.2.200.100:1209 <-> 
10.10.0.101:51030
[196 bytes]
(---------------) MRCP/2.0 196 RECOGNIZE 1
(---------------) Channel-Identifier: 00004F844A41FEC8@speechrecog
(---------------) N-Best-List-Length: 2
(---------------) Speech-Language: pt-pt
(---------------) Content-Type: text/uri-list
(---------------) Content-Length: 22
(---------------) 
(---------------) builtin:grammar/digits
(---------------) 1364 - Send MRCPv2 Stream 10.2.200.100:1210 <-> 
10.10.0.101:51008
[141 bytes]
(---------------) MRCP/2.0 141 SPEAK 1
(---------------) Channel-Identifier: 0000680C4A41FEC8@speechsynth
(---------------) Voice-Name: Amalia
(---------------) Content-Type: text/plain
(---------------) Content-Length: 2
(---------------) 
(---------------)  0
(---------------) 1364 - Process Control Message
(---------------) 968 - Receive Connection Task Message [1]
(---------------) 968 - On Control Channel Modify 0x472d550 <00004AA04A41FEC8>
(---------------) 968 - Raise App Response 0x472d550 <00004AA04A41FEC8> [2] 
SUCCESS [0]
(---------------) 1344 - Signal Application Task Message
(---------------) 968 - Receive Application Task Message [1]
(---------------) 968 - Receive App MRCP Request 0x472d550 <00004AA04A41FEC8>
(---------------) 968 - Send MRCP Request 0x472d550 
<00004AA04A41FEC8@speechsynth> [1]
(---------------) 1364 - Process Control Message
(---------------) 1364 - Send MRCPv2 Stream 10.2.200.100:1211 <-> 
10.10.0.101:51035
[141 bytes]
(---------------) MRCP/2.0 141 SPEAK 1
(---------------) Channel-Identifier: 00004AA04A41FEC8@speechsynth
(---------------) Voice-Name: Amalia
(---------------) Content-Type: text/plain
(---------------) Content-Length: 2
(---------------) 
(---------------)  0
(---------------) 1364 - Receive MRCPv2 Stream 10.2.200.100:1210 <->
10.10.0.101:51008 [83 bytes]
(---------------) MRCP/2.0 83 1 200 IN-PROGRESS
(---------------) Channel-Identifier: 0000680C4A41FEC8@speechsynth
(---------------) 
(---------------) 1364 - Parsed MRCP Message [83]
(---------------) 1364 - Failed to Find Channel <0000680C4A41FEC8@speechsynth> 
in
Connection 10.2.200.100:1210 <-> 10.10.0.101:51008 [1]
(---------------) 1364 - Receive MRCPv2 Stream 10.2.200.100:1211 <->
10.10.0.101:51035 [83 bytes]
(---------------) MRCP/2.0 83 1 200 IN-PROGRESS
(---------------) Channel-Identifier: 00004AA04A41FEC8@speechsynth
(---------------) 
(---------------) 1364 - Parsed MRCP Message [83]
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [3]
(---------------) 968 - Raise App MRCP Response 0x472d550 <00004AA04A41FEC8>
(24 10:25:39.109) 1364 - Receive MRCPv2 Stream 10.2.200.100:1209 <->
10.10.0.101:51030 [83 bytes]
(---------------) MRCP/2.0 83 1 200 IN-PROGRESS
(---------------) Channel-Identifier: 00004F844A41FEC8@speechrecog
(---------------) 
(---------------) 1364 - Parsed MRCP Message [83]
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [3]
(---------------) 968 - Raise App MRCP Response 0x17fb8050 <00004F844A41FEC8>
(24 10:25:39.125) 1364 - Receive MRCPv2 Stream 10.2.200.100:1176 <->
10.10.0.101:51066 [148 bytes]
(---------------) MRCP/2.0 148 START-OF-INPUT 1 IN-PROGRESS
(---------------) Channel-Identifier: 000060F04A41FEC1@speechrecog
(---------------) Proxy-Sync-Id: 04401E1C00010733
(---------------) Input-Type: speech
(---------------) 
(---------------) 1364 - Parsed MRCP Message [148]
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [3]
(---------------) 968 - Raise App MRCP Event 0x4733808 <000060F04A41FEC1>
(24 10:25:39.156) 1364 - Receive MRCPv2 Stream 10.2.200.100:1204 <->
10.10.0.101:51039 [141 bytes]
(---------------) MRCP/2.0 141 SPEAK-COMPLETE 1 COMPLETE
(---------------) Channel-Identifier: 000057984A41FEC7@speechsynth
(---------------) Completion-Cause: 000 normal
(---------------) Speech-Marker: ""
(---------------) 
(---------------) 1364 - Parsed MRCP Message [141]
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [3]
(---------------) 968 - Raise App MRCP Event 0x17f910b8 <000057984A41FEC7>
(---------------) 1344 - Signal Application Task Message
(---------------) 968 - Receive Application Task Message [0]
(---------------) 968 - Receive App Request 0x17f910b8 <000057984A41FEC7> [1]
(---------------) 968 - Dispatch Application Request 0x17f910b8 
<000057984A41FEC7> [1]
(---------------) 968 - Terminate Session 0x17f910b8 <000057984A41FEC7>
(---------------) 968 - Remove Control Channel 0x17f910b8 <000057984A41FEC7>
(---------------) 968 - Subtract Channel Termination 0x17f910b8 
<000057984A41FEC7>
(---------------) 1364 - Process Control Message
(---------------) 1364 - Remove Control Channel <000057984A41FEC7@speechsynth> 
[0]
(---------------) 1364 - Close TCP/MRCPv2 Connection 10.2.200.100:1204 <->
10.10.0.101:51039
(---------------) 968 - Subtract Termination 0x17f910b8 <000057984A41FEC7>
(---------------) 1364 - Signal Connection Task Message
(---------------) 968 - Receive Connection Task Message [2]
(---------------) 968 - On Control Channel Remove 0x17f910b8 <000057984A41FEC7>
(24 10:25:39.203) 2252 - Receive SIP Event [nua_r_bye] Status 200 OK
(---------------) 2252 - Receive SIP Event [nua_i_state] Status 200 to BYE
(---------------) 2252 - SIP Call State 0x17f910b8 [terminated]
(---------------) 2252 - Signal Signaling Task Message
(---------------) 968 - Receive Signaling Task Message [1]
(---------------) 968 - Receive Terminate Response 0x17f910b8 <000057984A41FEC7>

Original comment by cpsoa...@gmail.com on 24 Jun 2009 at 11:06

GoogleCodeExporter commented 9 years ago
  Hi Arsen,

  After a bit of desperation, the code is really difficult to follow with all the
function pointers and friends, I confess I'm a bit frustrated right now. As I 
said in
the last message I discovered a pattern that is always present when the crash is
near. In one of the last stress runs I found a situation that didn't make the
application crash. As always the problem is in the same spot so maybe you can 
inspect
and see if it gives you any hints. Please look at this piece of trace, on each 
line
the first integer is the thread id so that the thread context can be easily
identified, please see also my comments:

--> Application request to create a resource (TTS resource):

(---------------) 1340 - Receive Application Task Message [0]
(---------------) 1340 - Receive App Request 0x1c86a110 <new> [2]
(---------------) 1340 - Add MRCP Handle 0x1c86a110
(---------------) 1340 - Dispatch Application Request 0x1c86a110 <new> [2]
(---------------) 1340 - Add Control Channel 0x1c86a110 <new@speechsynth>
(---------------) 1340 - Add RTP Termination 0x1c86a110 <new>
(---------------) 5788 - Process Control Message
(---------------) 5788 - Signal Connection Task Message
(---------------) 1340 - Receive Connection Task Message [0]
(---------------) 1340 - On Control Channel Add 0x1c86a110 <new>
(---------------) 5880 - Process MPF Message
(---------------) 5880 - Add Termination
(---------------) 5880 - Add Context
(---------------) 5880 - Process MPF Message
(---------------) 1340 - Receive Media Task Message [0]
(---------------) 1340 - On Termination Add 0x1c86a110 <new>
(---------------) 5880 - Add Termination
(---------------) 1340 - Receive Media Task Message [0]
(---------------) 1340 - On Termination Add 0x1c86a110 <new>

--> This will send the INVITE to the MRCPv2 server (below the SDP)

(---------------) 1340 - Send Offer 0x1c86a110 <new> [c:1 a:1 v:0]
(---------------) 1340 - Local SDP 0x1c86a110 <new>
(---------------) v=0
(---------------) o=UniMRCPClient 0 0 IN IP4 10.2.200.100
(---------------) s=-
(---------------) c=IN IP4 10.2.200.100
(---------------) t=0 0
(---------------) m=application 9 TCP/MRCPv2 1
(---------------) a=setup:active
(---------------) a=connection:new
(---------------) a=resource:speechsynth
(---------------) a=cmid:1
(---------------) m=audio 59282 RTP/AVP 0 8
(---------------) a=rtpmap:0 PCMU/8000
(---------------) a=rtpmap:8 PCMA/8000
(---------------) a=recvonly
(---------------) a=mid:1
(---------------) 5792 - Receive SIP Event [nua_i_state] Status 0 INVITE sent
(---------------) 5792 - SIP Call State 0x1c86a110 [calling]
(30 11:27:02.562) 5792 - Receive SIP Event [nua_r_invite] Status 200 OK
(---------------) 5792 - Receive SIP Event [nua_i_state] Status 200 OK
(---------------) 5792 - SIP Call State 0x1c86a110 [ready]

--> This is the MRCPv2 Server response SDP with resource identifier and MRCP 
port

(---------------) 5792 - Remote SDP 0x1c86a110 <new>
(---------------) v=0
(---------------) o=MRCPv2Server 3455349925 3455349925 IN IP4 10.10.0.101
(---------------) s=-
(---------------) c=IN IP4 10.10.0.101
(---------------) t=3455349925 0
(---------------) m=application 51058 TCP/MRCPv2 1
(---------------) a=channel:00006E144A49F625@speechsynth
(---------------) a=connection:new
(---------------) a=setup:passive
(---------------) a=cmid:1
(---------------) m=audio 10074 RTP/AVP 0
(---------------) a=rtpmap:0 PCMU/8000
(---------------) a=sendonly
(---------------) a=mid:1
(---------------) 5792 - Signal Signaling Task Message
(---------------) 5792 - Receive SIP Event [nua_i_active] Status 200 Call active
(---------------) 1340 - Receive Signaling Task Message [0]
(---------------) 1340 - Receive Answer 0x1c86a110 <new> [c:1 a:1 v:0]
(---------------) 1340 - Modify Control Channel 0x1c86a110 <00006E144A49F625>
(---------------) 1340 - Modify Termination 0x1c86a110 <00006E144A49F625>
(---------------) 5880 - Process MPF Message
(---------------) 5788 - Process Control Message
(---------------) 5880 - Create Audio Bridge
(---------------) 5880 - Open RTP Receive 10.2.200.100:59282 <- 
10.10.0.101:10074
playout [50 ms]
(---------------) 1340 - Receive Media Task Message [1]
(---------------) 1340 - On Termination Modify 0x1c86a110 <00006E144A49F625>

--> The UniMRCP now tries to connect to the MRCP port

(---------------) 5788 - Established TCP/MRCPv2 Connection 10.2.200.100:3166 <->
10.10.0.101:51058

--> Now, please note, the error when it happens is *ALWAYS* on this line "Add 
Control
Channel" after a successful connection to the MRCP port, see that the resource
identifier is "10.10.0.101" instead of "00006E144A49F625@speechsynth", this 
will add
a wrong control channel id in the control channel list which will produce
side-effects at the end.

(---------------) 5788 - Add Control Channel <10.10.0.101> 10.2.200.100:3166 <->
10.10.0.101:51058 [1]
(---------------) 5788 - Signal Connection Task Message
(---------------) 1340 - Receive Connection Task Message [1]

--> The thread 1340 has the correct resource id, though!

(---------------) 1340 - On Control Channel Modify 0x1c86a110 <00006E144A49F625>
(---------------) 1340 - Raise App Response 0x1c86a110 <00006E144A49F625> [2] 
SUCCESS [0]
(---------------) 3212 - Signal Application Task Message
(---------------) 1340 - Receive Application Task Message [1]
(---------------) 1340 - Receive App MRCP Request 0x1c86a110 <00006E144A49F625>
(---------------) 1340 - Send MRCP Request 0x1c86a110 
<00006E144A49F625@speechsynth> [1]

--> The same thread that had an incorrect resource id now can send the correct
resource id because it was the application that built the MRCP TTS SPEAK 
message.

(---------------) 5788 - Process Control Message
(---------------) 5788 - Send MRCPv2 Stream 10.2.200.100:3166 <-> 
10.10.0.101:51058
[141 bytes]
(---------------) MRCP/2.0 141 SPEAK 1
(---------------) Channel-Identifier: 00006E144A49F625@speechsynth
(---------------) Voice-Name: Amalia
(---------------) Content-Type: text/plain
(---------------) Content-Length: 2
(---------------) 
(---------------)  0

--> MRCPv2 Server responds with an IN-PROGRESS...

(---------------) 5788 - Receive MRCPv2 Stream 10.2.200.100:3166 <->
10.10.0.101:51058 [83 bytes]
(---------------) MRCP/2.0 83 1 200 IN-PROGRESS
(---------------) Channel-Identifier: 00006E144A49F625@speechsynth
(---------------) 
(---------------) 5788 - Parsed MRCP Message [83]

--> And here is the result: the resource id in the message cannot be correctly
identified because it was previously inserted on the list as "10.10.0.101".

(---------------) 5788 - Failed to Find Channel <00006E144A49F625@speechsynth> 
in
Connection 10.2.200.100:3166 <-> 10.10.0.101:51058 [1]

  Thanks in advance and best regards.

Original comment by cpsoa...@gmail.com on 30 Jun 2009 at 5:01

GoogleCodeExporter commented 9 years ago
Hi cpsoares,

I was quite busy during the days, anyway thanks for the update and your 
analyzes.
Hopefully there are only a few questions left.

1. Have you tried to enable OWN_ALLOCATOR_PER_POOL. Does it make any difference 
to you?
2. What version of APR do you use?
3. What if I provide small patch against the APR you use?

Original comment by achalo...@gmail.com on 1 Jul 2009 at 9:44

GoogleCodeExporter commented 9 years ago
  Hi Arsen,

  1. Yes I have the macro enabled and apparently doesn't make any difference at all.
  2. I use the APR built directly from the UniMRCP libs trunk. In fact I built APR
DLL two months ago and since then I use it as is. The version is 1.3.4.0 for
libaprutil-1.dll, 1.3.3.0 for libapr-1.dll and 1.2.1.0 for libapriconv-1.dll.
  3. No problem at all, I can rebuid the apr with that patch.

  Thanks a lot.

Original comment by cpsoa...@gmail.com on 1 Jul 2009 at 10:47

GoogleCodeExporter commented 9 years ago
Well, here it goes.

1. Apply attached apr_pools.h and apr_pools.c over your version (apr-1.3.3)
apr/include/apr_pools.h
apr/memory/unix/apr_pools.c
2. Rebuild APR
3. Make sure you have new apr.dll in the path (copy it manually or just build
prepare.vcproj again)
4. Apply patch apt_pool.c.diff to apt_pool.c
5. Rebuild UniMRCP
6. Rebuild your application
7. Finally I hope to get positive feedback at this step.

Original comment by achalo...@gmail.com on 1 Jul 2009 at 11:33

Attachments:

GoogleCodeExporter commented 9 years ago
  Arsen,

  Just an update on this. Running now my application with the patched source as you
suggested. The application is up for 1 day and 22 hours with constant stress 
testing
(ASR and TTS requests) without a single issue. I have roughly since then, 
800000 MRCP
sessions and the same ammount of calls. My milestone is to run 1000000 MRCP 
sessions
and same ammount of calls and I'll proclaim this solved for good :-). I'll keep 
in
touch. BTW, if this proves to solve the issue what will be the official fix? 
Will it
include a patch in APR? I'm not worried as I can live perfectly with a patch on 
APR!!

  Thanks and regards.

Original comment by cpsoa...@gmail.com on 3 Jul 2009 at 11:09

GoogleCodeExporter commented 9 years ago
Thanks for the update and good news.
Actually this is a patch we should live with regardless it fixes this issue or 
not. 
Probably this is the biggest mistake I did in the project. I've never thought 
that
APR memory allocation isn't thread safe. APR allocators are protected by mutex, 
but
still apr_palloc is NOT thread safe. If I introduce wrapper around apr_palloc, 
it
doesn't help, because apr_palloc is used internally in APR routines too. 
Actually I
have no other option but to make APR pools thread safe themselves.
More over, channel identifier was corrupted as a result of typical race, where 
I fork
the request from application to MPF and MRCPv2 connection tasks and both use 
the same
pool to allocate memory from. I'm thinking also about subpools. Most probably 
I'll
create a subpool for MPF context, which should help avoid addition, 
non-desirable
memory pool locks (performance). 
Anyway memory pool must be thread safe to me. More over, there are a number of
patches available for Sofia-SIP out there. Most probably, I'll end up with 
dependency
pack for unimrcp, which will include fixes and additions over the released 
versions
of libraries unimrcp uses.

Original comment by achalo...@gmail.com on 3 Jul 2009 at 12:10

GoogleCodeExporter commented 9 years ago
Hi cpsoares,

About your test : can you confirm to us you run a MRCPv2 test with SIP over UDP 
and 
MRCP over TCP (a TCP socket is created for each session) ?

Thanks

Original comment by amasse.a...@gmail.com on 3 Jul 2009 at 2:21

GoogleCodeExporter commented 9 years ago
  Hi amasse,

  Yes, SIP over UDP and MRCP over TCP with no connection sharing (unimrcp have issues
when MRCP connection sharing is enabled).

Regards.

Original comment by cpsoa...@gmail.com on 3 Jul 2009 at 2:27

GoogleCodeExporter commented 9 years ago
cpsoares,
What kind of issues have you experienced concerning shared connections?
The only issue I'm aware of is: mem pool of shared connection is cleared only 
when
there are no more sessions/channels left in that connection.

Original comment by achalo...@gmail.com on 3 Jul 2009 at 2:35

GoogleCodeExporter commented 9 years ago
  Hi Arsen,

  When I first tried this project (a couple of months ago) to do my integration I
used a default xml that had shared connections configured. The result was that, 
due
to race conditions I suppose, some connections were disconnected with pending
sessions (typically sessions that were about to be created). This resulted in 
some
sessions not to convey properly the MRCP commands to the MRCP server and 
stalled. I
inclusively referenced this problem in Issue 28 (I think) without however 
detailing
it because I discovered a workaround by modifying the xml so that it disabled 
the
MRCP connection sharing which in this case solved completely the problem.

  Now, what I can do is, after I reach the milestone I talked about some messages
ago, I can try to do the same stress tests with MRCP connection sharing enebled 
and
post here some traces to illustrate the problem (if it yet exists at all, 
because it
was tried some time ago, in some older revision than the one in trunk right 
now).

  Regards.

Original comment by cpsoa...@gmail.com on 3 Jul 2009 at 3:03

GoogleCodeExporter commented 9 years ago
Well, cpsoares.
Please retest it later and if the issue you observed still exists, report it as 
a new
one, as most probably, it is not related to the memory corruption. 
Meantime I guess, issue-28 can be related to issue-29 and it makes sense to 
retest it
with the patched APR.
Thanks.

Original comment by achalo...@gmail.com on 3 Jul 2009 at 3:56

GoogleCodeExporter commented 9 years ago
Hi Arsen/cpsoares.

I believe the issue on connection sharing has been also resolved by all 
mofications 
done in the issue-7.

Can you remember me how to enable/disable the no connection sharing ?

Note : Arsen, I'm going to create a new subject on the "mail list" about the 
infinite 
increasing memory (APR pool) when connection sharing !!!

Regards.

Original comment by amasse.a...@gmail.com on 6 Jul 2009 at 8:24

GoogleCodeExporter commented 9 years ago
  Hi Amasse,

  You can disable MRCP connnection sharing by setting the "offer-new-connection"
parameter to "1" (the value of "0" means share existing connections or create a 
new
one in case it doesn't exist any connection) on the "connection->agent" context 
of
the xml configuration file you provide to the unimrcp suite. Here is my xml as 
an
example.

  Regards.

<!-- UniMRCP client document -->
<unimrcpclient>
  <!-- Client settings (signaling, media, ...) -->
  <settings>
    <!-- SIP, RTSP signaling agents -->
    <signaling>
      <!-- SofiaSIP MRCPv2 signaling agent -->
      <agent name="SIP-Agent-1" class="SofiaSIP">
        <param name="client-ip" value="10.2.200.100"/>
        <!-- <param name="client-ext-ip" value="auto"/> -->
        <param name="client-port" value="13565"/>
        <param name="server-ip" value="10.10.0.101"/>
        <param name="server-port" value="5060"/>
        <!-- <param name="force-destination" value="1"/> -->
        <!-- <param name="sip-transport" value="tcp"/> -->
        <param name="sip-transport" value="udp"/>
        <param name="ua-name" value="UniMRCP SofiaSIP"/>
        <param name="sdp-origin" value="UniMRCPClient"/>
      </agent>
    </signaling>
    <!-- MRCPv2 connection agents -->
    <connection>
      <agent name="MRCPv2-Agent-1">
        <param name="max-connection-count" value="94"/>
        <param name="offer-new-connection" value="1"/>
      </agent>
    </connection>
    <!-- Media engines -->
    <media>
      <engine name="Media-Engine-1"/>
      <rtp name="RTP-Factory-1">
        <param name="rtp-ip" value="auto"/>
        <!-- <param name="rtp-ext-ip" value="auto"/> -->
        <param name="rtp-port-min" value="15000"/>
        <param name="rtp-port-max" value="60000"/>
        <!-- <param name="playout-delay" value="50"/> -->
        <!-- <param name="max-playout-delay" value="200"/> -->
        <!-- <param name="ptime" value="20"/> -->
        <param name="codecs" value="PCMU PCMA"/>
      </rtp>
    </media>
  </settings>
  <!-- Client profiles -->
  <profiles>
    <!-- MRCPv2 default profile -->
    <profile name="MRCPv2-Default">
      <param name="signaling-agent" value="SIP-Agent-1"/>
      <param name="connection-agent" value="MRCPv2-Agent-1"/>
      <param name="media-engine" value="Media-Engine-1"/>
      <param name="rtp-factory" value="RTP-Factory-1"/>
    </profile>
    <!-- more profiles might be added here -->
  </profiles>
</unimrcpclient>

Original comment by cpsoa...@gmail.com on 6 Jul 2009 at 10:24

GoogleCodeExporter commented 9 years ago
Hi,

Anthony, you can force new connection from server config too 
(unimrcpserver.xml).
<param name="force-new-connection" value="1"/>
In this case unimrcpserver will respond with "new", even if client offers 
"existing".
Looking forward for your post regarding memory allocation and connection 
sharing.

Original comment by achalo...@gmail.com on 6 Jul 2009 at 11:34

GoogleCodeExporter commented 9 years ago
  Hi Arsen,

  My application made more than 2 million MRCP sessions without a glitch, so I think
this issue is now completely solved. Thanks a lot.
  I'll try to provide you an example for issue 28 at the end of the day.

  Regards.

Original comment by cpsoa...@gmail.com on 7 Jul 2009 at 10:43

GoogleCodeExporter commented 9 years ago
Really good news! Thanks.

Original comment by achalo...@gmail.com on 7 Jul 2009 at 1:20

GoogleCodeExporter commented 9 years ago
The patch, which fixes this issues is incorporated into dependency package 
available
from download area.
See the discussion below for more info
http://groups.google.com/group/unimrcp/browse_frm/thread/68a71c71679e3c24

Original comment by achalo...@gmail.com on 23 Jul 2009 at 4:57

GoogleCodeExporter commented 9 years ago

Original comment by achalo...@gmail.com on 25 Aug 2009 at 5:43

GoogleCodeExporter commented 9 years ago
Here is an updated patch for apr-1.4.6 as well as an SRPM for Fedora 16. Note 
that you seem to have to build and install both the i686 and x86_64 versions to 
prevent a conflict.

Original comment by JerseyChewi@gmail.com on 4 May 2012 at 11:27

Attachments:

GoogleCodeExporter commented 9 years ago
Hello folks o/~

In trying to build the latest unimrcp on Ubuntu 12.04/amd64 I hit a #78 which 
references this bug.

I brought this up on dev@apr.apache.org and this seems to be the conclusio so 
far:

> The patch adds a mutex to apr_pool_t and locks the mutex in calls to
> apr_palloc, apr_pool_clear and apr_pool_destroy.  That may make those
> individual calls threadsafe but it doesn't really make pool use as a
> whole threadsafe.  If one thread clears a pool every other thread using
> the pool is left with dangling references.  It looks like a lot of
> locking for no real gain.

http://mail-archives.apache.org/mod_mbox/apr-dev/201304.mbox/%3C878v4gjh3u.fsf%4
0ntlworld.com%3E

Original comment by i.ga...@brainsware.org on 18 Apr 2013 at 11:20

GoogleCodeExporter commented 9 years ago
Hello Igor,

I noticed your post on the APR dev list yesterday and will try to respond there 
later today. Thanks for bringing this up.

Original comment by achalo...@gmail.com on 18 Apr 2013 at 3:03