arut / nginx-rtmp-module

NGINX-based Media Streaming Server
http://nginx-rtmp.blogspot.com
BSD 2-Clause "Simplified" License
13.36k stars 3.51k forks source link

Sometimes, streaming does not start #126

Open atrottmann opened 11 years ago

atrottmann commented 11 years ago

I am debugging an issue in which sometimes, streams fail to start. In one particular setup, this happens to me maybe one in 20 times I try to start a stream.

I turned on the debugging log and did some digging.

The first thing I saw with an unsuccessful playback is "handler 0 failed". This is written in ngx_rtmp_handler.c in the function ngx_rtmp_receive_message as a failure of *evh().

It turns out the evh is in this case was ngx_rtmp_amf_message_handler in ngx_rtmp_receive.c which returnes NGX_ERROR when ph() does not return NGX_OK.

In the observed cases, *ph was ngx_rtmp_send_shared_packet returning -2 (so, NGX_AGAIN) as a return value of ngx_rtmp_send_message. This was due to the "drop packet? note we always leave 1 slot free" code that returns NGX_AGAIN when certain thresolds are met. It turns out in my case, nmsg was going up to 256 when this happened.

In all the cases I observed, "somehow" it happened that s->connection->write->active was true for a certain time, so ngx_rtmp_send_message never executed the hgx_rtmp_send(), so probably the messages were just queued and at one point, it lead to the error return, then the "handler 0 failed" and finally the disconnection.

I did not manage to exactly find out what happens, it appears to me tere are quite a lot "play" and "pause" commands sent by the player in the beginning of the streaming.

I have saved error logs of the same process, first when it didn't work (error-bad.log), and then, with the same player, same server and same content, when it worked (error-good.log). You can get these logs from http://guardian.werft22.net/public/errorlogs.zip because they are a bit big to paste here.

D you have any idea about the reason of this behaviour?

Kind regards, Andreas Trottmann

arut commented 11 years ago

active means write event is already scheduled, no need to call send explicitly at that point since it will return EAGAIN. The reason for that is the client stopped receiving any TCP data from server.

Your debug log is no good for tracing that kind of problem. It seems you have error_log logs/error.log debug_core; Please replace debug_core with debug to have more info about low-level IO.

atrottmann commented 11 years ago

Thank you very much!

I have made another pair of log files with "debug" as log level.

http://guardian.werft22.net/public/errorlogs-2.zip

Kind regards, Andreas Trottmann

atrottmann commented 10 years ago

I have been looking into this problem again, since it still has been happening, always "once in a while", never "reliable" enough for a good way of debugging :-) :-(

However, it appears that it only happens when using a certain Flash player, which is sending a "pause" command very quickly after streaming starts - it's the way of this player to determine the available bandwidth.

It looks like this is when nginx sometimes drops the connection, when it calls

ngx_rtmp_send_status(s, "NetStream.Pause.Notify", (...)

in line 631, of ngx_rtmp_play_module.c, in the function ngx_rtmp_play_pause. Apparently ngx_rtmp_send_status then returns NGX_AGN. Because NGX_AGN is != NGX_OK, ngx_rtmp_play_pause then returns NGX_ERROR, and this then causes ngx_rtmp_receive_message to return an error state as well, outputting "handler 0 failed".

Is there a way to deal gracefully with this NGX_AGN from the ngx_rtmp_send_status?

atrottmann commented 10 years ago

I have now created a new "minimal" flash player consisting of a single Video object and the following code:

nc = new NetConnection(); nc.connect("rtmp://server/app"); ns = new NetStream(nc); my_video.attachVideo(ns); ns.setBufferTime(5); ns.play("mp4:/path/file.mp4");

setInterval(switcher, 10);

function switcher () { flag = !flag; ns.pause(flag); }

When I run this against an nginx-rtmp server that is connected to the client via Gigabit Ethernet, it usually works. When I run this against an nginx-rtmp server at the other end of a DSL line, it instantly fails. If I decrease the interval to 1 or 2, it fails quite quickly against the directly connected server as well.

When I run it against Wowza, it works.

The failure in nginx is, as far as I can see, always in ngx_rtmp_play_module.c when ngx_rtmp_play_pause calls ngx_rtmp_send_status(s, "...Notify", "....") and this one returns NGX_AGAIN.

Kind regards,

atrottmann commented 10 years ago

I have now reviewed everything I did in last year's debugging session as well as in this one, and came to the conclusion that it all is connected with the size of the output queue.

I have then googled and found this: https://groups.google.com/forum/#!topic/nginx-rtmp/ojCNqYxrQEU

And now, with a higher out_queue (384 instead of 256), the problem appears to be solved.

If you think this is the right way to solve such a special case (i.e. a flash-based player sending lots of commands at once), feel free to close the issue.

Kind regards,

Andreas Trottmann

atrottmann commented 9 years ago

After some more experience, it appears that the higher out_queue sometimes also does not help. Especially when there is a high network latency, it appears that nginx is queueing a/v packets until the queue is full, and then has no space anymore for control messages such as notifications about the stream starting.

I have seen that the ngx_rtmp_send_message function has an argument "priority". When this function is called, the prority is always set to 0, except when sending live a/v packets.

Setting this priority to 0 makes it so the rtmp message is sent in any case, except when the output queue is full.

Setting it to something greater than 0 makes it so the rtmp message is only sent if there is some space left in the queue.

I have patched my local installation of nginx-rtmp to set priority to 2 when sending on-demand a/v packets. According to my understanding, this makes it so there is always room for control messages (which are sent at priority 0), even when the client has not yet fetched the queued a/v packets.

This appears to have fixed the problem in a better way.

diff --git a/ngx_rtmp_flv_module.c b/ngx_rtmp_flv_module.c
index 4776e54..395cdbf 100644
--- a/ngx_rtmp_flv_module.c
+++ b/ngx_rtmp_flv_module.c
@@ -507,7 +507,7 @@ ngx_rtmp_flv_send(ngx_rtmp_session_t *s, ngx_file_t *f, ngx_uint_t *ts)

     ngx_rtmp_prepare_message(s, &h, ctx->msg_mask & (1 << h.type) ?
                              &lh : NULL, out);
-    rc = ngx_rtmp_send_message(s, out, 0);
+    rc = ngx_rtmp_send_message(s, out, 2); // this makes it so there are slots free for control messages
     ngx_rtmp_free_shared_chain(cscf, out);

     if (rc == NGX_AGAIN) {
diff --git a/ngx_rtmp_mp4_module.c b/ngx_rtmp_mp4_module.c
index 0259ca2..9b57284 100644
--- a/ngx_rtmp_mp4_module.c
+++ b/ngx_rtmp_mp4_module.c
@@ -2290,7 +2290,7 @@ ngx_rtmp_mp4_send(ngx_rtmp_session_t *s, ngx_file_t *f, ngx_uint_t *ts)
         out = ngx_rtmp_append_shared_bufs(cscf, NULL, &in);

         ngx_rtmp_prepare_message(s, &h, cr->not_first ? &lh : NULL, out);
-        rc = ngx_rtmp_send_message(s, out, 0);
+        rc = ngx_rtmp_send_message(s, out, 2); // this makes it so there are slots free for control messages
         ngx_rtmp_free_shared_chain(cscf, out);

         if (rc == NGX_AGAIN) {

Kind regards,

Andreas Trottmann