MattRiddell / unimrcp

Automatically exported from code.google.com/p/unimrcp
Apache License 2.0
0 stars 0 forks source link

crashes and garbage logging when running unimrcpclient tts demo #33

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. I modified unimrcp tts client to also test pause/resume/stop (see 
attached zip file)
2. Do (almost) like a monkey:
    in the client create multiple sessions fast:
    type: run synth and create multiple sessions (3 or 4 in my case will 
do) by hitting up arrow and enter (command line recall)
3. I experienced several crashes. I was running the client using the VS08 
debugger. 

What version of the product are you using? On what operating system?
Latest trunk release on win32 (r984)

Please provide any additional information below.

The attached unimrcp client log shows 
(1) garbage when logging the phrase of a 
    "Receive SIP Event [nua_i_state] Status 200 "
(2) The log ends when I experienced a crash in libsofia_sip_ua.dll

----

The above test was meant to just test my tts plugin (which I updated to 
the latest revisions) rather than testing the unimrcp client, but anyway, 
it's pretty well reproducable (on my 5 year old and slow laptop) so I 
thought I should report this. It might be related to issue with sofia sip 
as described in issue 20 !?

Additionally, I experienced some crashes before the one documented here 
above. These were probably not related to sofia-sip
I was able to identify some of them. I changed the code to prevent the 
crashes in the future as these were all dereferences of pointers to 0. I 
did not figure out what the exact scenario's were (it was monkey testing 
as described above) and I did not save any loggings.

The (modified) sources are in the attached zip file.

Original issue reported on code.google.com by garmt.no...@gmail.com on 13 Jun 2009 at 9:09

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for the issue report. If you experienced an issue, which is more or less
reproducibly, it makes sense to report it.
Running demo synth with your modifications, I've not got Sofia's status phrase
garbled, but I observed several issue, which should be separated.
1. Concurrency and delays caused by Sleep()
You should not use Sleep() in a thread serving multiple sessions, channels. 
This will
definitely not work the way you expected. However, if this helps identify 
problems in
client stack, I never mind.
2. SPEAK-COMPLETE event and double mrcp_application_session_terminate() call.
Please note, mrcp_application_session_terminate() MUST be called only once,
afterward, as soon as session termination has been acknowledged by client stack,
session must be destroyed.
In fact, double mrcp_application_session_terminate() will be sent, if 
SPEAK-COMPLETE
event is received, and this definitely will cause memory related issues and 
crashes.
I believe you met this issue, as this is the only case, when 
sofia_session->mutex can
be NULL. Probably you should do nothing on SPEAK-COMPLETE in the modified 
scenario.
3. Active sessions and quit
Also you should not "quit" (destroy demo framework), while there are active 
sessions
in play. I guess this could be other problem you experienced with NULL pointer 
in
demo framework task.
4. Garbage in Sofia's status phrase string
I assume, this is the actual issue to reproduce. I wonder, if this will be still
reproducible to you, if you ensure point 2 and 3 never happens.
Also please indicate version of Sofia you use. I guess, it's 1.12.10

Original comment by achalo...@gmail.com on 15 Jun 2009 at 8:03

GoogleCodeExporter commented 9 years ago
Thanks for your answer. I thought I managed to cut and paste the whole log from 
the 
dos box, but apparently this was not the case. Yes, it is sophia sip 1.12.10 
(with 
the "improved" tport.c)

ad (1) I had some doubts about the sleeps. I was going to write about this, but 
then 
since you suggested yourself to do this (in 
"http://unimrcp.googlegroups.com/web/
demo_synth_application.c") I continued. So I decided to make short sleeps, and 
since 
I knew that streaming the generated TTS speech would take considerably longer 
than 
those two sleeps, I supposed it would not be too bad. (I must say I initially 
thought that each session would have its own tasks/threads, and that multiple 
sessions would not have trouble with each other sessions' sleeps, although I 
did not 
dive too deep into the code and logging to figure this out. (I should add an 
(optional) thread-id to the log function.)

The alternative solution would be to implement a "delayed send MRCP request, 
but 
conditionally to the current state", but for the purpose of my test (test my 
plugin) 
that was a little too much.

2. SPEAK-COMPLETE event. I'm not sure I fully understand the scope of what you 
are 
saying. In my test I would not expect to see a SPEAK-COMPLETE event at all.
And additionally, I was hoping they were mutually exclusive: a SPEAK-COMPLETE 
should 
not arrive (or at least be processed) if a stop response was received (and vice 
versa).

In my case, the stop complete response should arrive well before the 
SPEAK-COMPLETE 
event. Indeed, if I would start a lot of sessions, the sleeps might get the 
client 
into trouble. In my log file I don't see a SPEAK-COMPLETE event ? 

Note that there is a kind of race condition here. You cannot be sure what is 
first: 
the stop from the client or the speak-complete event from the server. 

I don't think I see this race condition in my log.

I don't understand the double mrcp_application_session_terminate() you see in 
the 
log. In my logging the session pointers for different sessions (one behind the 
finished/destroyed other) is the same, but the sessions are different.

I guess when I used your original pause/resume scenario I met this double 
mrcp_application_session_terminate() issue, and experienced the crashes with 
the 
null pointer (sofia->mutex). That's exactly why I commented out the 
mrcp_application_channel_remove(session,channel);

I'm not sure what my preference is - as a developer i would want my program to 
crash, so that I can improve it - but if i would run this in production i would 
probably prefer that a problem in one session would not have an adverse effect 
on 
the other sessions (or channels). It may turn out to be very difficult to 
implement 
a proper state machine (client) in this multi-tasked, multi-session, 
multi-channel, 
networked, time-constrained and real time environment. So i would prefer the 
"api" 
to be kind of robust against some (hopefully most) cases - even if they are 
developers faults.

(3) as mentioned before, it may be hard to make a proper state machine: one may 
have 
to do some timed garbage collection on sessions/channels as you can never be 
sure 
the server terminates a session. I would prefer to be able to stop and start 
the 
client without a crash taking place... I know this error processing is where it 
really becomes hard, and I realise you just supplied a client demo, not a 
finished 
application.

I commented out the block of code after the resume state (so one delay less, no 
stop 
message to be sent and only one 
mrcp_application_channel_remove(session,channel) - 
after the speak complete event) and I still see the corruption. No crashed 
though.
It is occurs less frequently - though it's not so hard to be a succesful monkey.

Thanks for all your help. Perhaps more later this week...

Original comment by garmt.no...@gmail.com on 17 Jun 2009 at 9:24

GoogleCodeExporter commented 9 years ago
Garmt,
I assume you are on vocation, but when you'll be back, please retest the same 
with
the latest release. I suspect this issue relates to Issue-29 and APR pools 
thread safety.

Original comment by achalo...@gmail.com on 23 Jul 2009 at 5:02

GoogleCodeExporter commented 9 years ago
Any updates? Is it reproducible with 0.7 release?

Original comment by achalo...@gmail.com on 9 Sep 2009 at 6:32

GoogleCodeExporter commented 9 years ago
Last week I tried to reproduce the problem (without upgrading either trunk or 
0.7 
release or APR). I failed to reproduce the problem (maybe I used release 
instead of 
debug executables !?). I will give it another try soon.

Original comment by garmt.no...@gmail.com on 10 Sep 2009 at 11:45

GoogleCodeExporter commented 9 years ago
Thanks for the update, Garmt.

Original comment by achalo...@gmail.com on 10 Sep 2009 at 12:57

GoogleCodeExporter commented 9 years ago
Summarizing the discussion below
http://groups.google.com/group/unimrcp/browse_thread/thread/8d756e7f15d37f98?hl=
en

I admit, this issue indeed existed, but should be already fixed in the trunk.

Related changes are: r1155, r1170, r1135.

Original comment by achalo...@gmail.com on 12 Oct 2009 at 5:01

GoogleCodeExporter commented 9 years ago

Original comment by achalo...@gmail.com on 27 Apr 2010 at 12:02