bryangrim / darkice

Automatically exported from code.google.com/p/darkice
0 stars 0 forks source link

Darkice trunk (r510) looses connection after ~ 2hours #84

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. start streaming and wait
2. after ~ 2 hours connection is lost without any reason
3. no output even with -v 10

What is the expected output? What do you see instead?
I expect no connection losses without any reason.

What version of the product are you using? On what operating system?
I use darkice trunk (svn revision 510) on ubuntu 13.04 64-bit, streaming to 
icecast 2.3.3 on localhost

Please provide any additional information below.

When the connection breaks, I have these log entries on icecast side:
[2013-05-03  20:13:52] INFO source/get_next_buffer End of Stream /stream.mp3
[2013-05-03  20:13:52] INFO source/source_shutdown Source "/stream.mp3" exiting

At this time, darkice itself does not say anything, even when started with "-v 
10". It just sits there as if there was no problem at all.

Since I am streaming to localhost, bad network cannot be an issue.

When using darkice version 1.1 with the exact same configuration and server, 
the stream runs for days and days without any problem.

Original issue reported on code.google.com by daniel.e...@gmail.com on 3 May 2013 at 6:48

GoogleCodeExporter commented 8 years ago
starting regression tests, will take some day until I got the change that 
causes this. But by looking at all the diffs between r500 and r510, I guess it 
has to be something in the MultiThreadedConnector changes in r510.

Original comment by daniel.e...@gmail.com on 4 May 2013 at 7:03

GoogleCodeExporter commented 8 years ago
While testing r509 I found out, that r510 does not reconnect, while r509 does.

I started streaming, then shut down icecast, wait some seconds and fired 
icecast back up. Result:

r509:
04-May-2013 09:09:20 Exception caught in BufferedSink :: write3
04-May-2013 09:09:20 MultiThreadedConnector :: sinkThread reconnecting  0
04-May-2013 09:09:20 couldn't write all from encoder to underlying sink 1210
04-May-2013 09:09:21 MultiThreadedConnector :: sinkThread reconnecting  0
[...]
04-May-2013 09:09:27 MultiThreadedConnector :: sinkThread reconnecting  0
04-May-2013 09:09:28 HTTP/1.0 200

and I'm back on air.

r510:
04-May-2013 08:58:00 Exception caught in BufferedSink :: write3
04-May-2013 08:58:00 MultiThreadedConnector :: sinkThread reconnecting  0
04-May-2013 08:58:00 couldn't write all from encoder to underlying sink 970

and there it sits and looks at the server with oogly eyes and does not 
reconnect or exit.

Again pointing to the MultiThreadedConnector changes.

Original comment by daniel.e...@gmail.com on 4 May 2013 at 8:05

GoogleCodeExporter commented 8 years ago
I have built r510 without the MultiThreadedConnector changes. It now runs since 
4 hours without any issue. It does not lose connection without any reason and 
it reconnects in case there really is a reason for a connection loss (like 
icecast restart).

Attached is a patch for r510 that reverts only the MultiThreadedConnector 
related changes.

Original comment by daniel.e...@gmail.com on 4 May 2013 at 8:57

Attachments:

GoogleCodeExporter commented 8 years ago
Regression applied in r514. Thanks!

We are investigating the deadlock issue.

Original comment by rafael2k...@gmail.com on 14 May 2013 at 3:16

GoogleCodeExporter commented 8 years ago
Yep The MultiThreadedConnector had a problem.
I tested the algorithm in a separate program and a race could happen.
Moving 1 pthread_lock() fixes the issue.
We should hold a lock on 'mutex_done' before letting the consumers run

Line 262
pthread_cond_broadcast(&cond_start); // kick the waiting consumers to look again
pthread_mutex_lock(&mutex_done);    // LOCK early to prevent missing a 
condition 'done' variable change 
pthread_mutex_unlock(&mutex_start); // UNLOCK, release the consumers' cond 
variable, now they can run

Original comment by oetelaar.automatisering on 14 May 2013 at 9:39

GoogleCodeExporter commented 8 years ago
There is actually another problem with this.
The consumer threads might not be listening for the producer yet before 
cond_start+mutex_start are changed.
I will try to fix this too, very soon.

Original comment by oetelaar.automatisering on 15 May 2013 at 7:50