Closed oleglab closed 1 year ago
Your ffmepg is encoding with x264. Is your hardware good enough for 100 encodings? Check your CPU stats. And is your network performance good enough?
And check if the kernel option is tuned. Check the socket buffer size and ulimit. https://airensoft.gitbook.io/ovenmediaengine/troubleshooting#5-2.-tuning-your-linux-kernel
@getroot we could scale out to 300 ffmpeg instances, but without trying to watch the video. As soon as we are trying to pull RTSP stream or watch the streams, the performance get slumped and origin get stuck... And streams might be overridden or pushed out by new ones:
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] MediaRouter | mediarouter_application.cpp:255 | [#default#push/load73(100)] Trying to create a stream
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] MediaRouter | mediarouter_stream.cpp:54 | [#default#push/load73(100)] Trying to create a mediarouter stream
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] Transcoder | transcoder_stream.cpp:626 | [#default#push/load73(100)] Output stream has been created. [#default#push/load73(588761627)]
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] Transcoder | transcoder_stream.cpp:86 | No decoder generated
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] MediaRouter | mediarouter_application.cpp:255 | [#default#push/load73(588761627)] Trying to create a stream
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] MediaRouter | mediarouter_stream.cpp:54 | [#default#push/load73(588761627)] Trying to create a mediarouter stream
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] Monitor | application_metrics.cpp:57 | Create StreamMetrics(load73/80d32c8a-d2d7-4807-b8bc-7146afb6f075/default/#default#push/load73/o) for monitoring
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] Transcoder | transcoder_stream.cpp:92 | [#default#push/load73(100)] Transcoder stream has been started.
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] Transcoder | transcoder_stream.cpp:1371 | [#default#push/load73(100)] -> [#default#push/load73(588761627)] Transcoder output stream has been deleted.
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] MediaRouter | mediarouter_application.cpp:480 | [#default#push/load73(588761627)] Trying to delete a stream
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] Monitor | application_metrics.cpp:69 | Delete StreamMetrics(load73/80d32c8a-d2d7-4807-b8bc-7146afb6f075/default/#default#push/load73/o) for monitoring
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] Monitor | stream_metrics.cpp:31 |
local-stream_1 | [Stream Info]
local-stream_1 | id(588761627), msid(0), output(load73), SourceType(Transcoder), RepresentationType(Source), Created Time (Thu Jan 12 20:03:23 2023) UUID(80d32c8a-d2d7-4807-b8bc-7146afb6f075/default/#default#push/load73/o)
local-stream_1 | >> Origin Stream Info
local-stream_1 | id(100), output(load73), SourceType(Rtmp), Created Time (Thu Jan 12 20:02:59 2023)
local-stream_1 |
local-stream_1 | Video Track #0: Name(bypass_video) Bitrate(0b) Codec(1,H264,Passthrough) BSF(H264_AVCC) Resolution(640x380) Framerate(30.00fps) KeyInterval(0) BFrames(0) timebase(1/1000)
local-stream_1 | Data Track #1: Name(E9bZFTWC) Codec(0,Unknown,Passthrough) BSF(Unknown) timebase(1/1000)
local-stream_1 |
local-stream_1 | >> Statistics
local-stream_1 | Last update time : Thu Jan 12 20:03:23 2023, Last sent time : Thu Jan 12 20:03:23 2023
local-stream_1 | Bytes in : 0B, Bytes out : 0B, Concurrent connections : 0, Max connections : 0 (Thu Jan 12 20:03:23 2023)
local-stream_1 |
local-stream_1 | >>>> By publisher
local-stream_1 | - Unknown : Bytes out(0B) Concurrent Connections (0)
local-stream_1 | - WebRTC : Bytes out(0B) Concurrent Connections (0)
local-stream_1 | - MPEGTSPush : Bytes out(0B) Concurrent Connections (0)
local-stream_1 | - RTMPPush : Bytes out(0B) Concurrent Connections (0)
local-stream_1 | - HLS : Bytes out(0B) Concurrent Connections (0)
local-stream_1 | - DASH : Bytes out(0B) Concurrent Connections (0)
local-stream_1 | - LLDASH : Bytes out(0B) Concurrent Connections (0)
local-stream_1 | - LLHLS : Bytes out(0B) Concurrent Connections (0)
local-stream_1 | - OVT : Bytes out(0B) Concurrent Connections (0)
local-stream_1 | - File : Bytes out(0B) Concurrent Connections (0)
local-stream_1 | - Thumbnail : Bytes out(0B) Concurrent Connections (0)
local-stream_1 |
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] MediaRouter | mediarouter_stream.cpp:72 | [#default#push/load73(588761627)] Delete mediarouter stream
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] Transcoder | transcoder_stream.cpp:106 | [#default#push/load73(100)] Transcoder stream has been stopped.
local-stream_1 | [2023-01-12 20:03:23.967] I [SPRTMP-T1935:60] MediaRouter | mediarouter_stream.cpp:72 | [#default#push/load73(100)] Delete mediarouter stream
local-stream_1 | [2023-01-12 20:03:23.968] W [InboundWorker:64] MediaRouter | mediarouter_stream.cpp:799 | [#default#push/ca315(100)] b-frame has been detected in the 0 track of inbound stream
local-stream_1 | [2023-01-12 20:03:23.968] W [InboundWorker:64] MediaRouter | mediarouter_stream.cpp:799 | [#default#push/ca315(100)] b-frame has been detected in the 0 track of inbound stream
local-stream_1 | [2023-01-12 20:03:23.968] W [InboundWorker:64] MediaRouter | mediarouter_stream.cpp:799 | [#default#push/ca315(100)] b-frame has been detected in the 0 track of inbound stream
local-stream_1 | [2023-01-12 20:03:23.968] W [OutboundWorker:66] MediaRouter | mediarouter_stream.cpp:1039 | [#default#push/ca315] Detected out of order DTS of packet. track_id:0 dts:7621553->0
local-stream_1 | [2023-01-12 20:03:23.968] W [SPRTMP-T1935:60] ov.Queue | queue.h:268 | [0x7f20b2aab058] #default#push/ca315-MR-Inbound size has exceeded the threshold: queue: 100, threshold: 100, peak: 100
local-stream_1 | [2023-01-12 20:03:23.968] W [OutboundWorker:66] MediaRouter | mediarouter_stream.cpp:1039 | [#default#push/ca315] Detected out of order DTS of packet. track_id:0 dts:7621554->33
local-stream_1 | [2023-01-12 20:03:23.968] W [OutboundWorker:66] MediaRouter | mediarouter_stream.cpp:1039 | [#default#push/ca315] Detected out of order DTS of packet. track_id:0 dts:7621555->67
local-stream_1 | [2023-01-12 20:03:23.968] W [InboundWorker:64] MediaRouter | mediarouter_stream.cpp:799 | [#default#push/ca315(100)] b-frame has been detected in the 0 track of inbound stream
local-stream_1 | [2023-01-12 20:03:23.968] W [InboundWorker:64] MediaRouter | mediarouter_stream.cpp:799 | [#default#push/ca315(100)] b-frame has been detected in the 0 track of inbound stream
local-stream_1 | [2023-01-12 20:03:23.968] W [OutboundWorker:66] MediaRouter | mediarouter_stream.cpp:1039 | [#default#push/ca315] Detected out of order DTS of packet. track_id:0 dts:7621556->100
local-stream_1 | [2023-01-12 20:03:23.968] W [OutboundWorker:66] MediaRouter | mediarouter_stream.cpp:1039 | [#default#push/ca315] Detected out of order DTS of packet. track_id:0 dts:7621557->133
local-stream_1 | [2023-01-12 20:03:23.968] W [InboundWorker:64] MediaRouter | mediarouter_stream.cpp:799 | [#default#push/ca315(100)] b-frame has been detected in the 0 track of inbound stream
local-stream_1 | [2023-01-12 20:03:23.968] W [InboundWorker:64] MediaRouter | mediarouter_stream.cpp:799 | [#default#push/ca315(100)] b-frame has been detected in the 0 track of inbound stream
local-stream_1 | [2023-01-12 20:03:23.968] W [InboundWorker:64] MediaRouter | mediarouter_stream.cpp:799 | [#default#push/ca315(100)] b-frame has been detected in the 0 track of inbound stream
local-stream_1 | [2023-01-12 20:03:23.968] W [InboundWorker:64] MediaRouter | mediarouter_stream.cpp:799 | [#default#push/ca315(100)] b-frame has been detected in the 0 track of inbound stream
local-stream_1 | [2023-01-12 20:03:23.968] W [OutboundWorker:66] MediaRouter | mediarouter_stream.cpp:1039 | [#default#push/ca315] Detected out of order DTS of packet. track_id:0 dts:7621558->167
local-stream_1 | [2023-01-12 20:03:23.968] W [OutboundWorker:66] MediaRouter | mediarouter_stream.cpp:1039 | [#default#push/ca315] Detected out of order DTS of packet. track_id:0 dts:7621559->200
local-stream_1 | [2023-01-12 20:03:23.968] W [OutboundWorker:66] MediaRouter | mediarouter_stream.cpp:1039 | [#default#push/ca315] Detected out of order DTS of packet. track_id:0 dts:7621560->233
local-stream_1 | [2023-01-12 20:03:23.968] W [OutboundWorker:66] MediaRouter | mediarouter_stream.cpp:1039 | [#default#push/ca315] Detected out of order DTS of packet. track_id:0 dts:7621561->267
local-stream_1 | [2023-01-12 20:03:23.968] W [InboundWorker:64] MediaRouter | mediarouter_stream.cpp:799 | [#default#push/ca315(100)] b-frame has been detected in the 0 track of inbound stream
Have you tuned the network kernel values? Please check this first. And it's hard to analyze the problem with just that part of the log. I often test 200+ rtmp inputs and I haven't encountered the same problem yet. When you encode with ffmpeg, if you set the bitrate to cbr 1 meg, does the result change? And what about the cpu usage per thread of the ome process when the problem occurs?
@getroot I tried so many different cases, with default kernel, and with setup suggested in this bug. I don't think that traffic is an issue, 100 ffmpeg doesn't create a lot of load actually. We modified our test case to play pre-recorded video instead of encoding it with test stream.
We were able to feed 600 rtmps without traffic from the edges, but the problems start with 64+ rtmp streams:
1) streams get pushed out by new ones. 2) with traffic from edges origin gets stuck. I think access control hook has to be enabled to reproduce it.
I am going to send you a video with our steps and setup to reproduce the issue in email.
Here is the kernel settings that we apply. The only difference is net.ipv4.tcp_fin_timeout=30 to be safe with GCP NAT:
sysctl -w \
net.core.somaxconn=65535 \
net.ipv4.tcp_congestion_control=bbr \
net.ipv4.tcp_max_tw_buckets=1440000 \
net.ipv4.ip_local_port_range="1024 65000" \
net.ipv4.tcp_window_scaling=1 \
net.ipv4.tcp_max_syn_backlog=3240000 \
net.core.rmem_max=16777216 \
net.core.wmem_max=16777216 \
net.core.rmem_default=16777216 \
net.core.wmem_default=16777216 \
net.core.optmem_max=40960 \
net.ipv4.tcp_rmem="4096 87380 16777216" \
net.ipv4.tcp_wmem="4096 65536 16777216" \
net.core.netdev_max_backlog=50000 \
net.ipv4.tcp_max_syn_backlog=30000 \
net.ipv4.tcp_max_tw_buckets=2000000 \
net.ipv4.tcp_tw_reuse=1 \
net.ipv4.tcp_fin_timeout=30 \
net.ipv4.tcp_slow_start_after_idle=0 \
net.ipv4.tcp_keepalive_time=7200 \
net.ipv4.tcp_keepalive_intvl=75 \
net.ipv4.tcp_keepalive_probes=9
Thanks for the detailed report. I'll look into this.
I saw the video and configuration file in the email you sent. I'm a bit confused on how to reproduce it because the contents of this article and the email are different.
Could you explain the repro path again step by step? Your email is below.
Set one OvenMediaEngine
Enable RTMP Provider and RTSP Pull Provider BlockDuplicateStreamName of RTMP is set to false Enable RTMP Push Publisher and WebRTC Publisher Disable SingedPolicy and AdmissionWebhooks
Play one stream through RTSP Pull (stream name is /push/rtsp)
Origin is rtsp://xxx:8554/dev-xxxxx (not OvenMediaEngine server)
Input RTMP streams to the OME using FFMPEG (stream name is /push/load$1)
/push/rtsp stream is overwritten by one stream from /push/load$1
And the log in the video is a bit incomprehensible. The player is trying to play /push/rtsp, but the log doesn't print anything about it, only a trail of trying and failing to play /push/load500. Is ws://localhost:3333/push/rtsp making a request to another OvenMediaEngine server that isn't getting recorded video? Is it composed of Origin-Edge, and only Origin appears in the video, and Edge has a different configuration? So, is the Server.xml you sent a configuration for Origin? Please clear up my confusion.
@getroot The idea was to provide you the simplest configuration which helps to reproduce the issue with 64+ streams, as you can see on the video, the actual number somewhere between 70 and 75.
1) Set one OvenMediaEngine:
Enable RTMP Provider and RTSP Pull Provider BlockDuplicateStreamName of RTMP is set to false Enable RTMP Push Publisher and WebRTC Publisher Disable SingedPolicy and AdmissionWebhooks
2) Open the pull stream in demo player and keep it running. Stream name /push/rtsp
3) Injest 64-70 RTMP streams to the OME using FFMPEG into the same application (stream name is /push/load$1)
4) Verify that /push/rtsp is still running
3) Injest 10 more RTMP streams to have 75 total
4) Verify that /push/rtsp was pushed out by rtmp/ffmpeg stream.
The same problem we experience in our cluster with multiple origins and edges: Origin performs very well while we start/injest ~64 RTMP streams, but as soon as it gets few more streams the OVT connections get dropped in cascade, origin get stuck and/or streams get mixed/pushed out.
Reproducible with both dev and latest docker containers
Step 1:
docker run -v $PWD/test-pull/Server.xml:/opt/ovenmediaengine/bin/origin_conf/Server.xml -p 1935:1935 -p 3333:3333 -p 3478:3478 -p 10006-10010:10006-10010 airensoft/ovenmediaengine:latest
Step 2: Verify that stream is running (Chrome preferably)
Step 3:
for i in {1..70}; do ./ff.sh $i; done
Step 4 (Repeat step 2): Verify that stream is still running
Step 5:
for i in {70..75}; do ./ff.sh $i; done
Step 6: Verify that stream is different
Thanks for the detailed reproduction path. I'll start this as soon as the task I'm doing now is finished.
Thanks for reporting the bug. I patched a problem where stream ids generated by different types of providers could be duplicated and committed it to the master branch. Please check if the image airensoft/ovenmediaengine:dev solves your problem.
Hi @getroot, thanks for the update. The dev build looks good! We couldn't reproduce the issue!
PROBLEM
Origin performes very well until we injest ~64 RTMP streams, but as soon as it gets few more connections the OVT connections get dropped in cascade.
EXPECTATION
The CPU and memory are barely utilized because we don't use transcoding, so it would be great to be able to scale it out more than 64 streams.
ENVIRONMENTAL INFORMATION
The latest docker
Dockerfile:
Just 100 rtmp streams generated by ffmpeg with h264:
Oven configuration:
no player is needed
Reproducible on Mac and Ubuntu with Docker above ^^^
SETUP INFORMATION AND LOGS
Please upload Server.xml. config.zip
[2023-01-12 00:16:18.620] W [OutboundWorker:31] MediaRouter | mediarouter_stream.cpp:1039 | [#dev#dev/streamXX] Detected out of order DTS of packet. track_id:0 dts:536201->536201 [2023-01-12 00:16:18.659] W [OutboundWorker:31] MediaRouter | mediarouter_stream.cpp:1039 | [#dev#dev/streamXX] Detected out of order DTS of packet. track_id:0 dts:536202->536201 [2023-01-12 00:16:51.457] I [SPRTMP-T1935:14] Provider | stream.cpp:49 | Unknown/(50) has been started stream [2023-01-12 00:16:51.458] I [SPRTMP-T1935:14] RTMPProvider | rtmp_provider.cpp:152 | A RTMP client has connected from <ClientSocket: 0x7f66d6fe5610, #50, Connected, TCP, Nonblocking, IP1:38518> [2023-01-12 00:16:51.722] I [SPRTMP-T1935:14] AccessController | access_controller.cpp:183 | AdmissionWebhooks queried http://{host}/v1/ac whether client 10.138.0.20:38518 could access rtmp://{host}:1935/dev/streamXX. (Result : Allow Elapsed : 8 ms) [2023-01-12 00:17:04.564] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2c610, #38, Closed, TCP, Nonblocking, IP2:26662> [2023-01-12 00:17:04.564] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2ec10, #41, Closed, TCP, Nonblocking, IP2:26698> [2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e05810, #32, Closed, TCP, Nonblocking, IP2:10424> [2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2fc10, #36, Closed, TCP, Nonblocking, IP2:26642> [2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2e610, #35, Closed, TCP, Nonblocking, IP2:26628> [2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2d210, #34, Closed, TCP, Nonblocking, IP2:10432> [2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e05610, #33, Closed, TCP, Nonblocking, IP2:10430> [2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2d610, #37, Closed, TCP, Nonblocking, IP2:26648> [2023-01-12 00:17:05.608] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:140 | OvtProvider is connected : <ClientSocket: 0x7f66d9e05610, #32, Connected, TCP, Nonblocking, IP2:24836> [2023-01-12 00:17:05.609] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e06410, #39, Closed, TCP, Nonblocking, IP2:26678> [2023-01-12 00:17:05.609] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2be10, #40, Closed, TCP, Nonblocking, IP2:26688> [2023-01-12 00:17:05.610] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:140 | OvtProvider is connected : <ClientSocket: 0x7f66d9e06410, #33, Connected, TCP, Nonblocking, IP2:24840>
OTHER HELPFUL INFORMATION
Please write any information that anyone viewing this issue can reference.