Closed dva-re closed 2 years ago
This may be due to poor performance. Check the CPU usage for each thread. https://airensoft.gitbook.io/ovenmediaengine/performance-tuning#performance-tuning
one stream, few viewers
Dech264 is an H.264 decoder. The decoder seems to be using too much CPU than normal. Is the server receiving a very high bitrate stream?
And what version of OME are you using? The performance of MediaRouter has been improved in the latest version.
If you don't have many viewers, a throttling by increasing the AppWorkerCount value to around 2-4 and lowering the StreamWorker to around 8 might help.
image: airensoft/ovenmediaengine:0.14.3
and I didn't observe this issue on 0.14.2 before
I used to create a separate OutputProfile for each quality, but in the new version I switched to ABR, thats all changes from my side.
current config was
<AppWorkerCount>8</AppWorkerCount>
<StreamWorkerCount>8</StreamWorkerCount>
now will try
<AppWorkerCount>4</AppWorkerCount>
<StreamWorkerCount>8</StreamWorkerCount>
ABR does not affect encoding performance. Only Playlist has been added. Please upload your entire Server.xml. Is it the same as before, except for Playlist?
And the log you posted in your first question tells us that you have multiple streams on your server. Isn't performance lacking when there are multiple stream inputs? The per-thread cpu usage you captured is when a single stream input comes in. Does the same problem occur when 1 stream input comes in? The captured CPU usage doesn't seem to be a problem now. It would be helpful to capture per-thread cpu usage when the problem occurs.
<?xml version="1.0" encoding="UTF-8"?>
<Server version="8">
<Name>OvenMediaEngine</Name>
<Type>origin</Type>
<IP>*</IP>
<StunServer>stun.l.google.com:19302</StunServer>
<Bind>
<Managers>
<API>
<TLSPort>8081</TLSPort>
<WorkerCount>2</WorkerCount>
</API>
</Managers>
<Providers>
<RTMP>
<Port>1935</Port>
<WorkerCount>5</WorkerCount>
</RTMP>
<SRT>
<Port>9999</Port>
<WorkerCount>5</WorkerCount>
</SRT>
<WebRTC>
<Signalling>
<TLSPort>3333</TLSPort>
<WorkerCount>2</WorkerCount>
</Signalling>
<IceCandidates>
<TcpRelay>*:3480</TcpRelay>
<TcpForce>false</TcpForce>
<IceCandidate>*:10006-10010/udp</IceCandidate>
<TcpRelayWorkerCount>1</TcpRelayWorkerCount>
</IceCandidates>
</WebRTC>
</Providers>
<Publishers>
<OVT>
<Port>9000</Port>
<WorkerCount>2</WorkerCount>
</OVT>
<WebRTC>
<Signalling>
<TLSPort>3333</TLSPort>
<WorkerCount>1</WorkerCount>
</Signalling>
<IceCandidates>
<TcpRelay>*:3480</TcpRelay>
<TcpForce>false</TcpForce>
<!-- in production in this place real IP -->
<IceCandidate>1.2.3.4:10010/udp</IceCandidate>
<TcpRelayWorkerCount>1</TcpRelayWorkerCount>
</IceCandidates>
</WebRTC>
</Publishers>
</Bind>
<Managers>
<Host>
<Names>
<Name>origin-server.domain.name</Name>
</Names>
<TLS>
<CertPath>/var/certs/domain.name/chain.pem</CertPath>
<KeyPath>/var/certs/domain.name/privkey.pem</KeyPath>
<ChainCertPath>/var/certs/domain.name/fullchain.pem</ChainCertPath>
</TLS>
</Host>
<API>
<AccessToken>abcd-real-token-instead</AccessToken>
</API>
</Managers>
<VirtualHosts>
<VirtualHost include="VHost*.xml"/>
<VirtualHost>
<Name>default</Name>
<Distribution>domain.name</Distribution>
<!-- Settings for multi ip/domain and TLS -->
<Host>
<Names>
<Name>origin-server.domain.name</Name>
<Name>192.168.10.2</Name>
</Names>
<TLS>
<CertPath>/var/certs/domain.name/chain.pem</CertPath>
<KeyPath>/var/certs/domain.name/privkey.pem</KeyPath>
<ChainCertPath>/var/certs/domain.name/fullchain.pem</ChainCertPath>
</TLS>
</Host>
<!-- Refer https://airensoft.gitbook.io/ovenmediaengine/signedpolicy -->
<SignedPolicy>
<PolicyQueryKeyName>pol</PolicyQueryKeyName>
<SignatureQueryKeyName>sig</SignatureQueryKeyName>
<SecretKey>SignRealSecretInstead</SecretKey>
<Enables>
<Providers>rtmp,webrtc,srt</Providers>
<Publishers>webrtc</Publishers>
</Enables>
</SignedPolicy>
<!-- Settings for applications -->
<Applications>
<Application>
<Name>app</Name>
<!-- Application type (live/vod) -->
<Type>live</Type>
<OutputProfiles>
<OutputProfile>
<Name>abr</Name>
<OutputStreamName>${OriginStreamName}</OutputStreamName>
<Playlist>
<Name>for Webrtc</Name>
<FileName>abr</FileName>
<Options>
<WebRtcAutoAbr>true</WebRtcAutoAbr>
</Options>
<Rendition>
<Name>SD</Name>
<Video>480p</Video>
<Audio>opus</Audio>
</Rendition>
<Rendition>
<Name>HD</Name>
<Video>720p</Video>
<Audio>opus</Audio>
</Rendition>
<Rendition>
<Name>FHD</Name>
<Video>1080p</Video>
<Audio>opus</Audio>
</Rendition>
</Playlist>
<Playlist>
<Name>For bypass webrtc</Name>
<FileName>bp</FileName>
<Options>
<WebRtcAutoAbr>true</WebRtcAutoAbr>
</Options>
<Rendition>
<Name>FHD</Name>
<Video>original</Video>
<Audio>opus</Audio>
</Rendition>
</Playlist>
<Encodes>
<Video>
<Name>480p</Name>
<Codec>h264</Codec>
<Width>854</Width>
<Height>480</Height>
<Bitrate>1200000</Bitrate>
<Framerate>60</Framerate>
<Preset>medium</Preset>
</Video>
<Video>
<Name>720p</Name>
<Codec>h264</Codec>
<Width>1280</Width>
<Height>720</Height>
<Bitrate>2400000</Bitrate>
<Framerate>60</Framerate>
<Preset>medium</Preset>
</Video>
<Video>
<Name>1080p</Name>
<Codec>h264</Codec>
<Width>1920</Width>
<Height>1080</Height>
<Bitrate>3000000</Bitrate>
<Framerate>60</Framerate>
<Preset>medium</Preset>
</Video>
<Video>
<Name>original</Name>
<Bypass>true</Bypass>
</Video>
<Audio>
<Name>opus</Name>
<Codec>opus</Codec>
<Bitrate>128000</Bitrate>
<Samplerate>48000</Samplerate>
<Channel>2</Channel>
</Audio>
</Encodes>
</OutputProfile>
<!-- <OutputProfile>
<Name>bypass_stream</Name>
<OutputStreamName>${OriginStreamName}</OutputStreamName>
<Encodes>
<Audio>
<Bypass>true</Bypass>
</Audio>
<Video>
<Bypass>true</Bypass>
</Video>
<Audio>
<Codec>opus</Codec>
<Bitrate>128000</Bitrate>
<Samplerate>48000</Samplerate>
<Channel>2</Channel>
</Audio>
</Encodes>
</OutputProfile>
<OutputProfile>
<Name>720p</Name>
<OutputStreamName>${OriginStreamName}_720p</OutputStreamName>
<Encodes>
<Audio>
<Bypass>true</Bypass>
</Audio>
<Video>
<Codec>h264</Codec>
<Width>1280</Width>
<Height>720</Height>
<Bitrate>1800000</Bitrate>
<Framerate>30.0</Framerate>
</Video>
<Audio>
<Codec>opus</Codec>
<Bitrate>128000</Bitrate>
<Samplerate>48000</Samplerate>
<Channel>2</Channel>
</Audio>
</Encodes>
</OutputProfile>
<OutputProfile>
<Name>480p</Name>
<OutputStreamName>${OriginStreamName}_480p</OutputStreamName>
<Encodes>
<Audio>
<Bypass>true</Bypass>
</Audio>
<Video>
<Codec>h264</Codec>
<Width>854</Width>
<Height>480</Height>
<Bitrate>1000000</Bitrate>
<Framerate>30.0</Framerate>
</Video>
<Audio>
<Codec>opus</Codec>
<Bitrate>128000</Bitrate>
<Samplerate>48000</Samplerate>
<Channel>2</Channel>
</Audio>
</Encodes>
</OutputProfile>-->
<OutputProfile>
<Name>record</Name>
<OutputStreamName>${OriginStreamName}_record</OutputStreamName>
<Encodes>
<!--<Audio>
<Bypass>true</Bypass>
</Audio>-->
<Video>
<Codec>h264</Codec>
<Width>1280</Width>
<Height>720</Height>
<Bitrate>1200000</Bitrate>
<Framerate>14.0</Framerate>
</Video>
</Encodes>
</OutputProfile>
</OutputProfiles>
<Providers>
<OVT/>
<WebRTC/>
<RTMP/>
<SRT/>
<WebRTC>
<Timeout>30000</Timeout>
</WebRTC>
</Providers>
<Publishers>
<AppWorkerCount>8</AppWorkerCount>
<StreamWorkerCount>8</StreamWorkerCount>
<OVT/>
<WebRTC>
<Timeout>30000</Timeout>
<Rtx>false</Rtx>
<Ulpfec>false</Ulpfec>
<JitterBuffer>false</JitterBuffer>
</WebRTC>
<FILE>
<RootPath>/records</RootPath>
<FilePath>
/${VirtualHost}/${Application}/${Stream}/${StartTime:YYYYMMDDhhmmss}_${EndTime:YYYYMMDDhhmmss}.ts
</FilePath>
<InfoPath>/record.log.xml</InfoPath>
</FILE>
</Publishers>
</Application>
</Applications>
</VirtualHost>
</VirtualHosts>
</Server>
Problem come back after 4 days without it. AppWorkerCount is 4, StreamWorkerCount is 8
ome_1 | [2022-07-18 11:51:16.214] W [SPRTMP-T1935:30] ov.Queue | queue.h:268 | [0x5579be845890] #default#app - Mediarouter inbound indicator (1/4) size has exceeded the threshold: queue: 78170, threshold: 500, peak: 78170
ome_1 | [2022-07-18 11:51:17.314] W [SPRTMP-T1935:30] ov.Queue | queue.h:268 | [0x7fd190015f98] #default#app/main1_stream-MR-Inbound size has exceeded the threshold: queue: 78275, threshold: 100, peak: 78275
ome_1 | [2022-07-18 11:51:21.234] W [SPRTMP-T1935:30] ov.Queue | queue.h:268 | [0x5579be845890] #default#app - Mediarouter inbound indicator (1/4) size has exceeded the threshold: queue: 78656, threshold: 500, peak: 78656
ome_1 | [2022-07-18 11:51:22.314] W [SPRTMP-T1935:30] ov.Queue | queue.h:268 | [0x7fd190015f98] #default#app/main1_stream-MR-Inbound size has exceeded the threshold: queue: 78761, threshold: 100, peak: 78761
ome_1 | [2022-07-18 11:51:26.254] W [SPRTMP-T1935:30] ov.Queue | queue.h:268 | [0x5579be845890] #default#app - Mediarouter inbound indicator (1/4) size has exceeded the threshold: queue: 79141, threshold: 500, peak: 79141
Now I cannot start streams at all. Before fail - it was only 2 stream (all trough local network), one from Atem mini pro via RTMP and second from webcam (OvenLiveKit) via ws/WebRTC
and very few viewers, all connected to one of 4 edge-servers
@getroot please help
Hmm... this feels like it's holding a thread somewhere. There are still not enough clues to analyze the exact cause.
First of all, what does it mean that the stream cannot be started? Stream creation from all three RTMP, WebRTC, SRT fails? Does that mean ome is completely stopped? Please test this and let me know the results.
And please upload the log files of the last 4 days.
By ws - there was no response from the server when trying to start broadcasting (aborted by timeout). I did not have time to try the other options for broadcasting, I had to correct the situation and restart the server. So far, we are treating this with a docker compose down and up again.
I saved the log. Please, just in case, in order not to disclose sensitive data, can I send you the log files somewhere (matrix, email, etc.) in a private message?
Thank you.
please send log files to support@airensoft.com. Thank you!
send log files to support@airensoft.com.
Sent. Thank you very much in advance.
I have received it well. We will comment on the results after analysis. thank you
@getroot Also, I found a saved log from July 14, when fails started. It seems to show why it is not possible to create a new stream (Reject stream creation) or something else that will help find the answer. Sent in a separate email.
There is one more piece of information I need. Is your Edge running the same version as Origin?
Is your Edge running the same version as Origin?
Yes, there are all on version 0.14.3
OK Thank you, now we are figuring out the cause of the problem.
@dva-re
Let me ask you a few more questions.
1) What CPU are you using? clock, number of cores
2) What is the average CPU usage when there are no failures?
It might help me to understand the log.
Thanks
@Keukhan hi
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz
Stepping: 7
CPU MHz: 1900.005
CPU max MHz: 3500.0000
CPU min MHz: 1000.0000
BogoMIPS: 5000.00
Virtualization: VT-x
L1d cache: 256 KiB
L1i cache: 256 KiB
L2 cache: 8 MiB
L3 cache: 11 MiB
NUMA node0 CPU(s): 0-15
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; TSX disabled
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t
m pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpui
d aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse
4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat
_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid
ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflush
opt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cq
m_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
current load (without problems)
@dva-re I just found a way to reproduce the hang problem. Thank you very much for your help, and I will tell you again when the bug is fixed!
Thank you!! I will wait with impatience and hope.
While it hasn't been fixed yet, is there anything I can do to minimize the chance of it happening?
This is reproduced when the network between Origin and Edge is not fast enough. In other words, if the network of Origin and Edge is faster than the bitrates of all tracks in the stream that Edge pulls from Origin, it will not be reproduced. If this problem is reproduced, the socket thread is blocked and is no longer available.
This will probably be fixed and committed today or tomorrow. And we will release 0.14.4 quickly.
Guys, thank you very VERY much. Can you please let me know when this commit happens? I will switch to the dev branch before 0.14.4 is released
@dva-re I fixed this problem and will release a new version when the stress test is completed. https://github.com/AirenSoft/OvenMediaEngine/commit/fd74fec49b2e5dd16bb868a131800714b72c9f40
Thank you!
Hello.
@getroot Maybe you can help me with this too?
As soon as I launched the third broadcast, it immediately became like this and began to slow down.
ome_1 | [2022-08-04 15:07:15.124] W [Rescaler:882] ov.Queue | queue.h:268 | [0x7f5e4389ce98] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1424, threshold: 120, peak: 1425
ome_1 | [2022-08-04 15:07:16.310] W [Rescaler:192] ov.Queue | queue.h:268 | [0x7f5c995d9c38] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1945, threshold: 120, peak: 1945
ome_1 | [2022-08-04 15:07:19.503] W [Rescaler:786] ov.Queue | queue.h:268 | [0x7f5d6b6b8598] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1780, threshold: 120, peak: 1780
ome_1 | [2022-08-04 15:07:20.154] W [Rescaler:882] ov.Queue | queue.h:268 | [0x7f5e4389ce98] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1464, threshold: 120, peak: 1465
ome_1 | [2022-08-04 15:07:21.310] W [Rescaler:192] ov.Queue | queue.h:268 | [0x7f5c995d9c38] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1987, threshold: 120, peak: 1987
ome_1 | [2022-08-04 15:07:24.517] W [Rescaler:786] ov.Queue | queue.h:268 | [0x7f5d6b6b8598] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1824, threshold: 120, peak: 1824
ome_1 | [2022-08-04 15:07:25.154] W [Rescaler:882] ov.Queue | queue.h:268 | [0x7f5e4389ce98] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1505, threshold: 120, peak: 1505
ome_1 | [2022-08-04 15:07:26.329] W [Rescaler:192] ov.Queue | queue.h:268 | [0x7f5c995d9c38] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 2031, threshold: 120, peak: 2031
ome_1 | [2022-08-04 15:07:29.533] W [Rescaler:786] ov.Queue | queue.h:268 | [0x7f5d6b6b8598] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1866, threshold: 120, peak: 1866
ome_1 | [2022-08-04 15:07:30.184] W [Rescaler:882] ov.Queue | queue.h:268 | [0x7f5e4389ce98] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1549, threshold: 120, peak: 1550
ome_1 | [2022-08-04 15:07:31.334] W [Rescaler:192] ov.Queue | queue.h:268 | [0x7f5c995d9c38] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 2077, threshold: 120, peak: 2077
ome_1 | [2022-08-04 15:07:34.540] W [Rescaler:786] ov.Queue | queue.h:268 | [0x7f5d6b6b8598] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1920, threshold: 120, peak: 1920
ome_1 | [2022-08-04 15:07:35.227] W [Rescaler:882] ov.Queue | queue.h:268 | [0x7f5e4389ce98] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1602, threshold: 120, peak: 1603
ome_1 | [2022-08-04 15:07:36.349] W [Rescaler:192] ov.Queue | queue.h:268 | [0x7f5c995d9c38] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 2130, threshold: 120, peak: 2130
ome_1 | [2022-08-04 15:07:39.554] W [Rescaler:786] ov.Queue | queue.h:268 | [0x7f5d6b6b8598] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1975, threshold: 120, peak: 1975
ome_1 | [2022-08-04 15:07:40.249] W [Rescaler:882] ov.Queue | queue.h:268 | [0x7f5e4389ce98] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1653, threshold: 120, peak: 1656
ome_1 | [2022-08-04 15:07:41.373] W [Rescaler:192] ov.Queue | queue.h:268 | [0x7f5c995d9c38] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 2184, threshold: 120, peak: 2185
ome_1 | [2022-08-04 15:07:44.601] W [Rescaler:786] ov.Queue | queue.h:268 | [0x7f5d6b6b8598] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 2027, threshold: 120, peak: 2029
ome_1 | [2022-08-04 15:07:45.289] W [Rescaler:882] ov.Queue | queue.h:268 | [0x7f5e4389ce98] Input queue of Encoder. codec(h264/27) size has exceeded the threshold: queue: 1712, threshold: 120, peak: 1714
The system load is not at the limit.
Does it make sense to increase some parameter in the config, or will only the GPU help me?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Please help. Any possible reasons why?
Streaming takes place over the local network (SRT). Further, several edges are used in the global network, from which it is given to the ovenplayer
early today