Closed liluxdev closed 11 years ago
P.S. but also when all is working well I get this debug message log about 1 time at second:
2012/09/29 18:55:53 [debug] 20356#0: *2 recv() not ready (11: Resource temporarily unavailable)
this is the "all working well" case log
2012/09/29 18:55:53 [debug] 20356#0: *2 RTMP recv audio (8) csid=4 timestamp=141 3492 mlen=262 msid=1 nbufs=1
2012/09/29 18:55:53 [debug] 20356#0: *2 nhandlers: 4
2012/09/29 18:55:53 [debug] 20356#0: *2 calling handler 0
2012/09/29 18:55:53 [debug] 20356#0: *2 record: day frame: mlen=262
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 13, 00007FFF6AAE1FA0, 11, 1456938 2
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 13, 0000000000D108F1, 262, 145693 93
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 13, 00007FFF6AAE1FA0, 4, 14569655
2012/09/29 18:55:53 [debug] 20356#0: *2 record: hour frame: mlen=262
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 14, 00007FFF6AAE1FA0, 11, 1456938 2
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 14, 0000000000D108F1, 262, 145693 93
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 14, 00007FFF6AAE1FA0, 4, 14569655
2012/09/29 18:55:53 [debug] 20356#0: *2 record: five frame: mlen=262
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 3, 00007FFF6AAE1FA0, 11, 1841232
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 3, 0000000000D108F1, 262, 1841243
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 3, 00007FFF6AAE1FA0, 4, 1841505
2012/09/29 18:55:53 [debug] 20356#0: *2 record: min frame: mlen=262
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 15, 00007FFF6AAE1FA0, 11, 552628
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 15, 0000000000D108F1, 262, 552639
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 15, 00007FFF6AAE1FA0, 4, 552901
2012/09/29 18:55:53 [debug] 20356#0: *2 record: default frame: mlen=262
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 17, 00007FFF6AAE1FA0, 11, 1456938 2
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 17, 0000000000D108F1, 262, 145693 93
2012/09/29 18:55:53 [debug] 20356#0: *2 write: 17, 00007FFF6AAE1FA0, 4, 14569655
2012/09/29 18:55:53 [debug] 20356#0: *2 calling handler 1
2012/09/29 18:55:53 [debug] 20356#0: *2 live: av: audio timestamp=1413492
2012/09/29 18:55:53 [debug] 20356#0: *2 RTMP prep audio (8) fmt=1 csid=6 timesta mp=26 mlen=262 msid=1 nbufs=1
2012/09/29 18:55:53 [debug] 20356#0: *596 RTMP send nmsg=1, priority=1 #42
2012/09/29 18:55:53 [debug] 20356#0: *596 send: fd:20 270 of 270
2012/09/29 18:55:53 [debug] 20356#0: posted event 0000000000000000
2012/09/29 18:55:53 [debug] 20356#0: *264 RTMP send nmsg=1, priority=1 #9
2012/09/29 18:55:53 [debug] 20356#0: *264 send: fd:19 270 of 270
2012/09/29 18:55:53 [debug] 20356#0: posted event 0000000000000000
2012/09/29 18:55:53 [debug] 20356#0: *253 RTMP send nmsg=1, priority=1 #217
2012/09/29 18:55:53 [debug] 20356#0: *253 send: fd:12 270 of 270
2012/09/29 18:55:53 [debug] 20356#0: posted event 0000000000000000
2012/09/29 18:55:53 [debug] 20356#0: *135 RTMP send nmsg=1, priority=1 #50
2012/09/29 18:55:53 [debug] 20356#0: *135 send: fd:16 270 of 270
2012/09/29 18:55:53 [debug] 20356#0: posted event 0000000000000000
2012/09/29 18:55:53 [debug] 20356#0: *59 RTMP send nmsg=1, priority=1 #208
2012/09/29 18:55:53 [debug] 20356#0: *59 send: fd:18 270 of 270
2012/09/29 18:55:53 [debug] 20356#0: posted event 0000000000000000
2012/09/29 18:55:53 [debug] 20356#0: *2 calling handler 2
2012/09/29 18:55:53 [debug] 20356#0: *2 calling handler 3
2012/09/29 18:55:53 [debug] 20356#0: *2 recv: fd:11 -1 of 280
2012/09/29 18:55:53 [debug] 20356#0: *2 recv() not ready (11: Resource temporari ly unavailable)
2012/09/29 18:55:53 [debug] 20356#0: timer delta: 4
2012/09/29 18:55:53 [debug] 20356#0: posted events 0000000000000000
2012/09/29 18:55:53 [debug] 20356#0: worker cycle
2012/09/29 18:55:53 [debug] 20356#0: epoll timer: -1
PS. to reproduce is important to wait the FMLE timeout, so try to keep network down on the encoder for about 30 seconds not 10 as i said before
have you tested with ping/ping timeout set on the server side ?
@kghost how to set this?
my nginx-rtmp is become completley unusable! now no one can push also across a reboot... and yesterday i launched it in production :(
Current log is:
2012/10/02 15:16:14 [debug] 18040#0: *58 AMF cmd 'receiveaudio' no handler
2012/10/02 15:16:14 [debug] 18040#0: *58 reusing formerly read data: 11
2012/10/02 15:16:14 [debug] 18040#0: *58 RTMP bheader fmt=3 csid=2
2012/10/02 15:16:14 [debug] 18040#0: *58 RTMP mheader fmt=3 user (4) time=16309451+0 mlen=10 len=0 msid=0
2012/10/02 15:16:14 [debug] 18040#0: *58 RTMP recv user (4) csid=2 timestamp=16309451 mlen=10 msid=0 nbufs=1
2012/10/02 15:16:14 [debug] 18040#0: *58 nhandlers: 1
2012/10/02 15:16:14 [debug] 18040#0: *58 calling handler 0
2012/10/02 15:16:14 [debug] 18040#0: *58 RTMP recv user evt set_buflen (3)
2012/10/02 15:16:14 [debug] 18040#0: *58 msid=1 buflen: 3000 (msec)
2012/10/02 15:16:14 [debug] 18040#0: *58 recv: fd:13 -1 of 146
2012/10/02 15:16:14 [debug] 18040#0: *58 recv() not ready (11: Resource temporarily unavailable)
2012/10/02 15:16:14 [debug] 18040#0: timer delta: 498
2012/10/02 15:16:14 [debug] 18040#0: posted events 0000000000000000
2012/10/02 15:16:14 [debug] 18040#0: worker cycle
2012/10/02 15:16:14 [debug] 18040#0: epoll timer: -1
PLEASE HELP ME I DON'T TOUCHED THE CONF BUT SINCE 2 HOURS AGO ALSO IF I REBOOT EVERYTHING THE STREAMING SERVER DOESN'T WORKS, I CANT UNDERSTAND WHAT'S HAPPENING
**UPDATE: it' wasnt realated to this bug: it's related to wrong on_publish url
Sorry for late answer. You should use ping or tcp keepalive.
Ping example (within server{} block)
server {
...
ping 10s;
...
}
10 seconds can be changed to any other value of your choise.
keepalive works the same way as nginx/http bit I think it's not needed.
When network gets disconnected the way you have described tcp connection usually hangs for a long time until keepalive/retransmission timeout expires. RTMP ping fixes that.
Thanks, very useful i set it now to 5s or you think this is too short?
PS. i solved my last problem : today I migrated the domain of my customer site to a production one, so i had:
old: beta.radioabc.domain new: radioabc.domain
redirected with a "moved permanetly" http redirect!
but i had in nginx rtmp:
on_publish http://beta.radioabc.domain/pub.php on_play http://beta.radioabc.domain/play.php
and the bad http response (redirect, moved) code blocks everyting from starting up, so first i had blocked silenly new "plays" and when i rebooted desperatley i had blocked the entire publishing
But the important is that now i understood :)
I'm using 1 minute ping. 5 seconds is too low value
Hello. I've got similar issue with Flash Live Encoder. I think it happens when network on encoder goes down for a few seconds. After that FLE tryes to reconnect, but fails. Here is part of FLE logs:
Fri Nov 16 2012 10:05:54 : Primary - More than 50 % of the specified threshold values (Max buffer size 512000 KB or Max buffer duration 3600 sec) of RTMP buffer reached. Current network conditions are unfavorable to publish data at the specified bitrate. RTMP buffer will be flushed and streams will be re-published when the threshold value is reached.
Fri Nov 16 2012 10:17:54 : Primary - More than 70 % of the specified threshold values (Max buffer size 512000 KB or Max buffer duration 3600 sec) of RTMP buffer reached. Current network conditions are unfavorable to publish data at the specified bitrate. RTMP buffer will be flushed and streams will be re-published when the threshold value is reached.
Fri Nov 16 2012 10:29:54 : Primary - More than 90 % of the specified threshold values (Max buffer size 512000 KB or Max buffer duration 3600 sec) of RTMP buffer reached. Current network conditions are unfavorable to publish data at the specified bitrate. RTMP buffer will be flushed and streams will be re-published when the threshold value is reached.
Fri Nov 16 2012 10:35:54 : Primary - RTMP buffer exceeded the specified threshold values (Max buffer size 512000 KB or Max buffer duration 3600 sec). Flushing the RTMP buffer and re-publishing the streams.
Fri Nov 16 2012 10:35:59 : Primary - Network Status: NetConnection.Connect.Closed status
Fri Nov 16 2012 10:35:59 : Primary - Disconnected
Fri Nov 16 2012 10:36:00 : Primary - Connected to FMS/3,0,1,123
Fri Nov 16 2012 10:36:00 : Primary - Stream[0] Status: Success
Fri Nov 16 2012 10:36:01 : Primary - Stream[0] Status: NetStream.Publish.Start
Fri Nov 16 2012 10:36:01 : Primary - Network Status: NetConnection.Connect.Closed status
Fri Nov 16 2012 10:36:07 : Primary - Disconnected
Fri Nov 16 2012 10:36:09 : Primary - Re-establishing connection, attempt 1
Fri Nov 16 2012 10:36:09 : Primary - Reconnected to FMS/3,0,1,123
Fri Nov 16 2012 10:36:09 : Primary - Stream[0] Status: Success
Fri Nov 16 2012 10:36:09 : Primary - Stream[0] Status: NetStream.Publish.Start
Fri Nov 16 2012 10:36:12 : Primary - Network Status: NetConnection.Connect.Closed status
Fri Nov 16 2012 10:36:18 : Primary - Disconnected
And this part repeats untill manual stop and start FLE.
Fri Nov 16 2012 10:36:09 : Primary - Re-establishing connection, attempt 1
Fri Nov 16 2012 10:36:09 : Primary - Reconnected to FMS/3,0,1,123
Fri Nov 16 2012 10:36:09 : Primary - Stream[0] Status: Success
Fri Nov 16 2012 10:36:09 : Primary - Stream[0] Status: NetStream.Publish.Start
Fri Nov 16 2012 10:36:12 : Primary - Network Status: NetConnection.Connect.Closed status
Fri Nov 16 2012 10:36:18 : Primary - Disconnected
Only after manual restart nginx starts receiving stream, Nginx version 1.3.7, rtmp module git version from 15.11.2012. Also tried to set ping values from 10s to 3m, nothing helped.
Exactley same log in my case! My problem wasn't solved with ping directive: i've to do a manual restart of FLME (Manual restart of nginx doesn't solve anything in my case, so now i think is a FLME bug)
Maybe it's FMLE bug, but when it streams to Adobe FMS it reconnects normally.
Uhm... i'm asking if https://github.com/arut/nginx-rtmp-module/issues/20 fix can help us.... Do you tried the new-live3 branch?
Just compiled new-live3 branch and set drop_idle_publisher param. I'll test it and say if it helps.
Module compiled from new-live3 branch with drop_idle_publisher param not helped.
Could you provide full debug log of nginx? Both before and after the network problem.
It will be quiet difficult because it appears after sevral hours of usage. I tried manually reproduce by unplugging nework cable for some seconds, but it not appeared. I'll try more with enabled debugging, but I don't know if I catch it. Also in error logs I've found string "Dropping idle publisher ... ", just after disconnect. So according to this, drop_idle_publisher param is working, but result is the same.
It's ok for me to receive big log.
Check this one http://live.peopleair.tv/live/error_19112012. I'm not 100% shure that "before" period is present here (connection lost in the same time when I enable log), but "after" preiod is present.
I confirm I cannot reproduce the bug disconnecting network cable too: it only happens only after 5-7 hours of publishing after first small network down ( I can confirm the net down because I've a ping logger with google.com and it also goes down for 2-3 secs when publisher loses the connection)
Here's the error in log 'RTMP in chunk stream too big: 64 >= 32'. Because of that error connection is closed by nginx and started again by the other side (FMS). That repeats again and again.
Max chunk stream number in nginx-rtmp is 32 by default. It can be increased with max_streams directive. However I'm not sure that will help (however you can try).
SkeLLLa if you could sniff network (with tcpdump or wireshark) at that moment (when the above mentioned error appears in log) that would be just amazing! You just have to wait until that error persists in log and sniff of 1-5 minutes of traffic.
There's another solution you can try. Write this in server{}
(not application{}
!) block.
publish_time_fix off
play_time_fix off;
I've set RTMP chunk to 128, also set publish_time_fix and play_time_fix as you recommended. But stream still after several hours drops. I've got new errors (repeats many times after stream drop):
2012/11/22 09:05:39 [info] 28919#0: *2 createStream, client: 176.36.227.154, server: 0.0.0.0:1935
2012/11/22 09:05:39 [info] 28919#0: *2 publish: name='0' args='videoKeyframeFrequency=5&totalDatarate=906' type=live silent=0, client: 176.36.227.154, server: 0.0.0.0:1935
2012/11/22 09:05:40 [info] 28919#0: *2 too big message: 1048576, client: 176.36.227.154, server: 0.0.0.0:1935
2012/11/22 09:05:40 [info] 28919#0: *2 disconnect, client: 176.36.227.154, server: 0.0.0.0:1935
2012/11/22 09:05:40 [info] 28919#0: *2 deleteStream, client: 176.36.227.154, server: 0.0.0.0:1935
Here is dumped log http://live.peopleair.tv/live/tcpdump_nginx
I've fixed that. Please
1) check out fmle-time-fix
branch
2) remove publish_time_fix/play_time_fix/max_streams lines from config
if something goes wrong please report (+dump). if it works ok I will merge that branch into master
Builded new nginx from fmle-time-fix, seems that FMLE autoreconnected successfully (before it was in "down" state). I'll test it for some days to be shure that fix works. Many thanks.
After a day of testing FMLE autoreconnects normally. Can't test any more, because we moved to linux+ffmpeg.
That's great, thanks for testing.
Arut, please let me know when you merge that into master and i will update and test it for a long period in my "production" and #77 issue affected environment (i only have a production one on this project, so i want update only from the master branch!)
OK, I'll report here
Thanks
I didn't resisted to wait :) from now i'm testing the fmle-time-fix branch directly in my high load environment, I'll report the result.
@arut The fix is awesome: for the first time i had no downtime and no require for manual FMLE restart over 3 days (still running with no downtime: testing it in my prod. environment with a minimum of 100 connected clients and over 6000 listeners per day)
Great news. Thanks for your report. I'll merge it.
The discussion might sugest, that if you use FMLE with main branch (fmle-time-fix seems to have been merged into the main branch), you should not use publish_time_fix/play_time_fix/max_streams in the config. We found that this is not the case, and for FMLE to work correctly with nginx-rtmp you still have to include publish_time_fix. We also use drop_idle_publisher, just in case.
I'll check that later (I'm out of city now and I only have tablet with me). It's still possible that the directive is needed for fmle but not needed for ffmeg which is the default and most wanted.
I use "flme-time-fix" ...
/home/root/nginx-rtmp-module-fmle-time-fix/ngx_rtmp.c: In function ‘ngx_rtmp_add_addrs’: /home/root/nginx-rtmp-module-fmle-time-fix/ngx_rtmp.c:681: error: passing argument 2 of ‘ngx_sock_ntop’ makes integer from pointer without a cast src/core/ngx_inet.h:110: note: expected ‘socklen_t’ but argument is of type ‘uchar ’ /home/root/nginx-rtmp-module-fmle-time-fix/ngx_rtmp.c:681: error: passing argument 3 of ‘ngx_sock_ntop’ makes pointer from integer without a cast src/core/ngx_inet.h:110: note: expected ‘u_char ’ but argument is of type ‘long unsigned int’ /home/root/nginx-rtmp-module-fmle-time-fix/ngx_rtmp.c:681: error: too few arguments to function ‘ngx_sock_ntop’ make[1]: _* [objs/addon/nginx-rtmp-module-fmle-time-fix/ngx_rtmp.o] Error 1 make[1]: Leaving directory `/usr/local/nginx' make: *\ [build] Error 2
my system: centos 6.4
It's pretty old. All the changes are now merged into master.
FMLE problems https://github.com/arut/nginx-rtmp-module/issues/354
Hi. Same problem here. If the emiter connection breaks up is imposible to transmit again until nginx restar. I've upload teh debug log:
http://webminar.kernet.es:8080/error_reconectar_debug
And the conf is worker_processes 14; rtmp_auto_push on; rtmp_auto_push_reconnect 1s;
rtmp { server { access_log /var/log/nginx/rtmp-access.log; listen 1935; ping 2s; ping_timeout 1s; notify_method get; max_streams 128; chunk_size 128; publish_time_fix off; application recepcion { live on; meta copy; allow publish all; allow play all; } }
hello sir ,
when i restart my internet connection i lose connection to the server rtmp incomming and i will have to reset my ngnix server manually is what to fix this anomaly my config
rtmp_auto_push on; rtmp { server {
listen 8001; push_reconnect 3s; interleave on; wait_video on; idle_streams off; chunk_size 4096; notify_method get; ping 10s; ping_timeout 600s;
thank you in advance for your help
How to try to reproduce:
Please try because maybe is just a problem of mine not a bug, but i need help to understand what happens to try to fix it.
Description of my case:
I'm sending audio stream to nginx rtmp via Adobe Flash Media Live Encoder situed in a machine with DLS connection. The DLS connection is a little unstable these days: it goes down for few seconds two times a day, but i realized that when a publisher connection in not properly closed my nginx rtmp instance becomes unstable, and useless: i've to reboot nginx! Adoble Flash Media Live Encoder succesfull reconnects to nginx rtmp, and it says that is going but no data packet is streamed anymore: i see the encoder connected in stats.xls I have a
like the working situation, but
remains at the last numeric value before the flap until a reboot (of course also real playback don't works anymore) and nginx debug log says following: