AirenSoft / OvenMediaEngine

OvenMediaEngine (OME) is a Sub-Second Latency Live Streaming Server with Large-Scale and High-Definition. #WebRTC #LLHLS
https://OvenMediaEngine.com/ome
GNU Affero General Public License v3.0
2.53k stars 1.06k forks source link

High CPU usage #468

Closed fcqpl closed 2 years ago

fcqpl commented 3 years ago

Hello, I'm worried, this CPU usage is normal? OME has only 5 incoming streams and 1 outgoing. I'm afraid to use OME on production... Other solutions is using ~20% CPU max.

There is only one output profile with Hardware Acceleration enabled.

MobaXterm_KZwTsE7esF OME_config.md

getroot commented 3 years ago

I committed today applying SRTP_AEAD_AES_128_GCM to SRTP. (previously, SRTP_AEAD_AES_128_GCM was used)

And I confirmed that OME uses a slightly lower CPU in DO's 1 core / 1GB memory environment.

9724819adf2947439dd452c53d781e45f6fbe046

lee-hammer99 commented 3 years ago

@basisbit

Hello, I'm a member of the OvenMediaEngine team. Thank you very much for your contribution. While reading your comments, I found something interesting and quoted it.

Last weekend I had an event with ~ 32000 unique attendees based on OME master from 13th of July.

I know you are busy, but if you share your story and experience with us, I think OME will develop further. If you are interested, please send an email to contact@airensoft.com.

Thank you!

getroot commented 3 years ago

We recently committed several OvenMediaEngine optimizations.

Here are the test results. I confirmed that the CPU usage increase when sessions(Viewers) are added in the DO instance is mostly due to DTLS/SRTP encryption. I'm going to study this part a bit more in-depth (I'm not familiar with DTLS/SRTP yet).

[DO 1 core CPU / 1GB memory]

root@ubuntu-ome-test:~# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : DO-Regular
stepping        : 2
microcode       : 0x1
cpu MHz         : 2494.108
cache size      : 4096 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt md_clear
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 4988.21
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

[our development server]

getroot@OME-Dev:~/project/OvenMediaEngine$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 25
model           : 33
model name      : AMD Ryzen 9 5900X 12-Core Processor
stepping        : 0
microcode       : 0xa201009
cpu MHz         : 4198.359
cache size      : 512 KB
physical id     : 0
siblings        : 24
core id         : 0
cpu cores       : 12
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
bugs            : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 8399.31
TLB size        : 2560 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
getroot commented 3 years ago

We fixed an issue today in OvenMediaEngine's HLS, DASH Packetizer that could be very CPU intensive. If you run OME with the default settings, it seems that you have been affected by this problem because HLS and DASH are turned on by default.

Bansikov commented 2 years ago

Hi. Tested with last master - stream with OBS FullHD 1 mbit:

  1. Only Webrtc output BYPASS => 1 Streamer +5%, 10 Viewers +5% = 10% CPU
  2. Only Webrtc output with OPUS codec ON => 1 Streamer +10%, 10 Viewers +10% = 20% CPU

So, with OPUS, each viewer needs 2x the processor? With 2 mbit the load increases by one and a half times.

getroot commented 2 years ago

@2002demon

The performance of the OPUS encoder is only affected by the number of incoming streams. It does not affect the output session. If you encode with OPUS, the bitrate will increase a bit by adding audio. Because WebRTC uses SRTP encryption, CPU usage increases slightly as the bitrate increases. But it's not doubling like you've experienced.

Did you run OME in release mode with "make release" command?

Bansikov commented 2 years ago

Yes.

make release make install systemctl start ovenmediaengine

Tested again with 2mbit BYPASS Streamer +8% 10 viewers +10% 25-30 mbit traffic

With OPUS Streamer +15% 10 viewers +15% 25-30 mbit traffic

Traffic is the same. Why needs OPUS so much CPU, its just audio (((

getroot commented 2 years ago

It seems that you are definitely using more CPU than my environment. Most of the CPU usage of the WebRTC session is SRTP encryption. If the CPU clock is low, more CPU may be used.

Bansikov commented 2 years ago

I have Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz Yes, i have set performace mode for CPU and i have fine results

With OPUS and 2 mbit 1 streamer 5% 10 viewer 5%

So, i think CPU is no more a problem here, becouse for my 800% CPU i can have 50 streamers => 250% cpu 1000 viewers => 500% cpu And thats OK.

Why last master does not accept: [Config] Unknown item found: Server.VirtualHosts.VirtualHost.VirtualHost.Applications.Application.Application.Publishers.StreamLoadBalancingThreadCount

And is 8 here fine or what i need to set?

                    <StreamLoadBalancingThreadCount>8</StreamLoadBalancingThreadCount>
                    <SessionLoadBalancingThreadCount>8</SessionLoadBalancingThreadCount>
getroot commented 2 years ago

Oh, I'm so glad you got a good result.

Its names have recently been changed to AppWorkerCount and StreamWorkerCount

Please refer to the URL below.

https://airensoft.gitbook.io/ovenmediaengine/performance-tuning#tuning-the-number-of-threads

It is recommended to increase the AppWorkerThread when there are many input streams, and increase the StreamWorkerThread when there are many output viewers.

getroot commented 2 years ago

Kernel configuration is also very important once hundreds of sessions are exceeded. Please refer to the next thread.

https://github.com/AirenSoft/OvenMediaEngine/issues/507

Bansikov commented 2 years ago

Thank you. Please fix the docs for Use-Case: https://imgs.su/upload/412/537002101.png

Last question not depending to CPU but very important for me. Its possible to get a list of active streams rtmp/webrtc (app/path names) and maybe with number of connected Webrtc viewers? I have no idea how...

getroot commented 2 years ago

@2002demon Thanks, I fixed the documentation.

What you want is probably this. https://airensoft.gitbook.io/ovenmediaengine/rest-api/v1/virtualhost/application/stream

fcqpl commented 2 years ago

Still too much CPU usage.

8 cores from E5-2620 @ 2.00GHz ~40% CPU usage with OME 36 incoming streams (RTMP), zero transcoding. In publishers only WebRTC (1 client).

Changed docker-compose to nimble on same server, same incoming streams: 1-3% CPU usage with nimble 36 incoming streams (RTMP), no transcoding. Nimble has enabled output HLS, SLDP, DASH.

Server.xml.md

getroot commented 2 years ago

@fcqpl

When I put 100 RTMP inputs (each 4Mbps) in my development server, the CPU is used as shown in the screenshot below. (Of course, the CPU clock rate of my development server is very high.) According to the CPU usage per thread, it can be seen that the rtmp module is using about 44% of the CPU (0.44% per input). The OutboundWorker and InboundWorker are responsible for parsing and converting the bitstream of the codec. This is something OME sometimes doesn't have to do, but it's something OME handles for more reliable streaming and scalability. AppWorker (AW-xxx) and StreamWorker are threads that are created for more input and output, and just exist, using 1% to 2% of CPU and waiting. (Therefore, reducing this number of threads has the effect of reducing CPU usage, but reducing the maximum capacity.)

I will continue to optimize OvenMediaEngine in the future, but for the services I manage, the current performance is showing sufficient performance to provide various commercial services.

I've never used nimble, but from your tests, nimble is a really great streaming server.

(I used ffmpeg to simulate 100 rtmp inputs. If you look at the CPU usage below, you can see that ffmpeg uses 1-2% per session just by reading a file and sending rtmp without encoding. Nimble is really amazing)


AMD Ryzen 9 5900X 12-Core Processor

image

image

My ffmpeg command script for 100 rtmp input

getroot@OME-Dev:~$ cat send.sh
#!/bin/bash

if [ -z $1 ]
then
        echo "$0 number_of_rtmp"
        exit
fi

for ((i=$1;i<$2;i++))
do
        ffmpeg -hide_banner -loglevel error -re -stream_loop -1 -i 2.mp4 -c:v copy -c:a copy -f flv rtmp://192.168.0.160:11935/app/stream$i &
        sleep .1
done

Server.xml.md

fcqpl commented 2 years ago

Moved to newer CPU to test: 1x Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz (4 core)

36 incoming streams (RTMP), zero transcoding. ~40% CPU usage with OME on 4 cores In publishers only WebRTC (1 client). putty_IQxuNGNFCC

Replaced OME with nginx-rtmp: ~6% CPU usage on 4 cores 1 client by RTMP putty_ggRgeHzQkS

the current performance is showing sufficient performance to provide various commercial services.

I can't agree.

I've never used nimble, but from your tests, nimble is a really great streaming server.

To be clear: I'm not affiliated with Softvelum LLC. I'm not advertising their solution. I'm only user of their product (as well as nginx-rtmp). I would like to transfer the whole streaming thing to OME, but the high CPU usage does not allow it.

fcqpl commented 2 years ago

8 cores from Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz

OME: putty_KKHkg2otFD

basisbit commented 2 years ago

@fcqpl I doubt you'll get any progress here by posting htop screenshots. Instead, I'd suggest you to further adjust your config to disable not needed stuff and to figure out what is causing the unusually high CPU usage for you. Check what code version you are compiling with what settings, check your virtualization environment and so on. If you get no progress with that, try to reproduce the much better result that others here reported on a typical cheap virtual machine like for example on some digital ocean virtual server.

fcqpl commented 2 years ago

@basisbit It's airensoft/ovenmediaengine:0.12.5 image. I'm not compiling it. My config was posted and here, it's minimal. Disabled opus transcoding, HLS, Dash, thumbnails etc. Server.xml.md

I can't do anything else, it's problem definitively in bad optimized code.

DO with 2vCPU "Premium Intel with NVMe SSD". Removed OVT and SRT from config: putty_5B0kPO6jAb putty_pBy2WCyLrB and... still too much.

getroot commented 2 years ago

@fcqpl As you can see, my environment is using 76.2% CPU at 100 RTMP inputs, and your environment is using 172% at 36 inputs.

Looking at your settings, StreamWorkerCount is set to 8. I set it to 1. StreamWorkerCount is the number of threads created per stream. So in your case there are 288 StreamWorkers running, each doing nothing but using 0.7% of the CPU. (This is probably due to excessive thread context switching.)

So you can save a lot of CPU usage by reducing StreamWorkerCount to 1. Also, for around 36 RTMP inputs, setting AppWorkerCount to 1 would be sufficient.

getroot commented 2 years ago

@fcqpl As you captured last time, you can check which threads are using the CPU excessively by checking with top -H -p [pid]. In your case you can see that there are too many StreamWorkers created. It can be seen that the CPU usage of AW-WebRTC (AppWorker for WebRTC) and SPRTMP-T19365 (SocketPool Thread for RTMP) is normal.

I'm going to have the AppWorkerCount and StreamWorkerCount autoscale dynamically in the future. (Of course, this is a very difficult task, so it will take quite a bit of time.)

Bansikov commented 2 years ago

@getroot fine to see all workers with top -H I try to get thumbnail from video and i have to much CPU usage: 1 rtmp input => 8% 1 rtmp input + image jpeg codec => 25% Becouse Dec h264 uses 16% cpu. This does not suit me. I know that these indicators are with a cold processor, but in general, it increases the processor consumption by 3 times. My idea is to get thumbnail minutly with ffmpeg but i cant access RTMP stream, is no RTMP publisher here? How to get rtmp stream back ? rtmp://server.com:1935/app/stream

getroot commented 2 years ago

@2002demon Please share your Server.xml. What are your jpeg encoding settings? In particular, how many did you set to?

OME does not provide RTMP streaming. You can pull HLS with ffmpeg to do what you want.

Bansikov commented 2 years ago
                                <Codec>jpeg</Codec>
                                <Framerate>1</Framerate>
                                <Width>240</Width>
                                <Height>135</Height>

Currently use StreamWorkerCount = 2 Ok, i try to get over HLS.

Keukhan commented 2 years ago

@2002demon

The cpu usage of the dech264 thread currently shown as the top is calculated as the sum of the rescaler and jpeg encoder usage.

For accurate measurement, the OME structure was partially changed to see the CPU usage of Decoder, Rescaler, and Encoder.

As shown in the measurement results, the RTMP share is less than 1%. Decoder, Rescaler, and Encoder use a lot of CPUs. Also, lower the resolution and frame of the thumbnail, use fewer CPUs.

Therefore, since your encoding setting has a low resolution and a framerate of 1, it is judged that the CPU used for jpeg encoding will be very small.

Please refer to it for your performance measurement.

Thanks.

[Environment]

CPU : AMD Ryzen 9 5900X 12-Core Processor 3.7GHz

Mem : 128G

[Input]

1080p / 30fps / 5Mbps

[Result]

jpg / 1080p / 30fps

3658475 soulk 20 0 12.4g 270532 137768 S 11.9 0.2 0:14.77 Encmjpeg 3658479 soulk 20 0 12.4g 270532 137768 S 11.9 0.2 0:12.46 Rescaler 3658442 soulk 20 0 12.4g 270532 137768 S 5.9 0.2 0:06.99 Dech264 3658441 soulk 20 0 12.4g 270532 137768 S 0.9 0.2 0:04.01 SPRTMP-T21935

jpg / 1080p / 15fps

3663838 soulk 20 0 12.3g 269740 137768 S 8.7 0.2 0:03.92 Dech264 3663869 soulk 20 0 12.3g 269740 137768 S 8.3 0.2 0:03.07 Encmjpeg 3663873 soulk 20 0 12.3g 269740 137768 S 6.0 0.2 0:02.84 Rescaler 3653841 soulk 20 0 12.4g 269740 137768 S 0.9 0.2 0:01.12 SPRTMP-T21935

jpg / 1080p / 1fps

3660502 soulk 20 0 12.4g 267040 137704 S 5.9 0.2 0:07.26 Dech 2643660538 soulk 20 0 12.4g 267040 137704 S 2.0 0.2 0:00.40 Encmjpeg 3660542 soulk 20 0 12.4g 267040 137704 S 2.0 0.2 0:00.83 Rescaler

jpg / 720p / 30ps

3668105 soulk 20 0 12.3g 261032 137768 S 12.6 0.2 0:04.57 Rescaler 3668101 soulk 20 0 12.3g 261032 137768 S 8.0 0.2 0:03.38 Encmjpeg 3668080 soulk 20 0 12.3g 261032 137768 S 7.3 0.2 0:03.21 Dech264

jpg / 720p / 15fps

3667106 soulk 20 0 12.3g 262352 137768 R 9.0 0.2 0:01.26 Dech264 3667132 soulk 20 0 12.3g 262352 137768 S 7.0 0.2 0:00.99 Rescaler 3667128 soulk 20 0 12.3g 262352 137768 S 4.0 0.2 0:00.68 Encmjpeg

jpg / 720p / 1fps

3665587 soulk 20 0 12.3g 260704 137704 S 10.0 0.2 0:04.20 Dech264 3665609 soulk 20 0 12.3g 260704 137704 S 1.0 0.2 0:00.20 Encmjpeg 3665613 soulk 20 0 12.3g 260704 137704 S 1.0 0.2 0:00.50 Rescaler

jpg / 480p / 30fps

3669491 soulk 20 0 12.3g 253376 137768 S 12.0 0.2 0:01.89 Rescaler 3669457 soulk 20 0 12.3g 253376 137768 S 8.0 0.2 0:01.45 Dech264 3669487 soulk 20 0 12.3g 253376 137768 S 5.7 0.2 0:00.92 Encmjpeg

jpg / 480p / 15fps

3670688 soulk 20 0 12.4g 257860 137768 S 6.3 0.2 0:04.94 Rescaler 3670652 soulk 20 0 12.4g 257860 137768 S 3.7 0.2 0:05.80 Dech264 3670684 soulk 20 0 12.4g 257860 137768 S 2.3 0.2 0:02.64 Encmjpeg 3672208 soulk 20 0 12.4g 257860 137704 S 0.8 0.2 0:01.01 SPRTMP-T21935

jpg / 480p / 1fps

3672230 soulk 20 0 12.3g 254108 137704 S 9.0 0.2 0:03.01 Dech264 3672258 soulk 20 0 12.3g 254108 137704 S 1.0 0.2 0:00.07 Encmjpeg 3672262 soulk 20 0 12.3g 254108 137704 S 1.0 0.2 0:00.35 Rescaler 3672208 soulk 20 0 12.4g 258328 137704 S 0.7 0.2 0:00.01 SPRTMP-T21935

Bansikov commented 2 years ago

@Keukhan getting JPG over ffmpeg from HLS work good and dont use CPU so much. I get JPG every 15 sec, it does not require constant re-encoding of the entire video.

Keukhan commented 2 years ago

@2002demon

I'm glad you solved it in a great way.

hls-based thumbnail images have a delay compared to Webrtc video. If you need a thumbnail image that is in sync with webrtc, we recommend using Thumbnail Publisher.

Hope you have a nice day.

Thanks.

ds88888888 commented 2 years ago

@basisbit

Hello, I'm a member of the OvenMediaEngine team. Thank you very much for your contribution. While reading your comments, I found something interesting and quoted it.

Last weekend I had an event with ~ 32000 unique attendees based on OME master from 13th of July.

I know you are busy, but if you share your story and experience with us, I think OME will develop further. If you are interested, please send an email to contact@airensoft.com.

Thank you!

Hammer, sorry this is a little off topic, but if you do manage to get a response for this, would you perhaps consider sharing it on OME's website as a case study or something like that? I think it will be very interesting to learn how OME is applied in various cases. Thanks!

basisbit commented 2 years ago

@ fcqpl As you can see, my environment is using 76.2% CPU at 100 RTMP inputs, and your environment is using 172% at 36 inputs.

Looking at your settings, StreamWorkerCount is set to 8. I set it to 1. StreamWorkerCount is the number of threads created per stream. So in your case there are 288 StreamWorkers running, each doing nothing but using 0.7% of the CPU. (This is probably due to excessive thread context switching.)

So you can save a lot of CPU usage by reducing StreamWorkerCount to 1. Also, for around 36 RTMP inputs, setting AppWorkerCount to 1 would be sufficient.

Because of the documentation not being clear about this,

@getroot Was this mostly a renaming in the long run, so if my machine has 8 CPU cores with a total of 16 CPU threads, and there will be only few different concurrent streams but max (as many as the machine does support) amount of concurrent viewers, should AppWorkerCount be 1 and StreamWorkerCount be something like 16? (or 32 because a stream worker does not continue work while it waits for IO?)

Would be nice for such changes in the future to be mentioned somewhere (maybe readme.md) with instructions, so that people who don't want to read all the git history still can continue using OME without having to fear running into a problem each time after updating, but only in production when at high load.

getroot commented 2 years ago

@basisbit

For a small number of input streams and a lot of viewers, it's a good idea to set 1 for AppWorker and 16 for StreamWorker, as you might think. A higher number of Workers than the number of Cores is not helpful, so it is not recommended to go higher than that.

Besides that, you may need to increase the number of other types of Workers. It is recommended to set while testing with a performance test tool because it all depends on the environment of use.

https://airensoft.gitbook.io/ovenmediaengine/performance-tuning

naanlizard commented 2 years ago

I'll necro this thread a little.

We have a use case with many incoming streams and many viewers to each stream - maybe 50 incoming streams typically, 0-50 viewers per stream (though median viewers per stream is probably around 10)

We deployed with the following configuration, and operational changes of the following -

Server.xml.txt

We experienced high CPU and memory usage after a short time, and are currently debugging what could have caused those.

Screen Shot 2022-04-07 at 11 20 35 AM

For reference, here are the past 7 days, the past two were with OME in production - notably higher CPU usage.

Screen Shot 2022-04-07 at 11 21 13 AM

Our thoughts are the following

basisbit commented 2 years ago

Perhaps HLS delivery by OME is inefficient in some way? But that would appear to be incorrect due to above discussion/testing.

Yes, from my experience with OME versions from half a year ago, HLS delivery requires roughly half as much CPU resources than WebRTC per viewer, when having very few streams.

The use case section on the gitbook is unhelpful

Absolutely agreed, especially now that there exist so many thread-count related parameters. This severely needs some proper load-testing for different use cases, getting rid of thread-count-settings which can easily be automatically derived from other settings or from CPU core / thread count, and then improve documentation for what actually has an impact for which method of video delivery to clients.

getroot commented 2 years ago

@naanlizard Thanks for sharing good information.

The biggest CPU usage I suspect is JPEG, PNG encoding. This uses more CPU than expected.

<Image>
    <Codec>jpeg</Codec>
    <Framerate>1</Framerate>
    <Width>640</Width>
    <Height>360</Height>
</Image>
<Image>
    <Codec>png</Codec>
    <Framerate>1</Framerate>
    <Width>1920</Width>
    <Height>1080</Height>
</Image>

If you need 1 thumbnail every 10 seconds, you can also set the Framerate to 0.1.

Perhaps the number of viewers did not have much effect on CPU usage. I guess that the CPU usage increased as the amount of Image encoding increased when the input stream increased.

naanlizard commented 2 years ago

That seems very plausible. Are there benchmarks for the image encoder? If not I will try and do some and compare with ffmpeg/rtmp.

This uses more CPU than expected.

Are you saying it is a known issue that OME uses more cpu than it should for generating screenshots/thumbnails? Or is that speculation

We will adjust our config and report back on CPU usage, and update the gitbook if there's somewhere good to put it

getroot commented 2 years ago

In my experience, many people predicted that encoding to an image format (JPEG, PNG) would use less CPU. That's why I explained that it uses more CPU than expected. (Please excuse my poor English. I'm sorry.)

I very much welcome you to share your experiences and knowledge on GITBOOK. It's harder for me to write documentation than it is to write code.

naanlizard commented 2 years ago

The most difficult part for writing documentation is knowing where to put things, and to understand everything that OME does. I'll write as much as I can. Where is the best place to discuss OME functionality and such? I don't like opening new issues all the time but if that is the easiest method, I'll do it.

Your english is great! Don't worry. I only ask to clarify.

In our case we've had very low CPU usage when encoding thumbnails and screenshots with nginx-rtmp, and ffmpeg, so I think if OME uses a lot more CPU that would be a bug. We'll see.

getroot commented 2 years ago

I have no ideas yet other than here where we can discuss OME functionality and such. Please open a new issue.

@Keukhan If image encoding for thumbnails in OME is higher CPU usage than ffmpeg cli then you need to look into it.

naanlizard commented 2 years ago

We are looking to reproduce it, but changing <AppWorkerCount>1</AppWorkerCount> to <AppWorkerCount>16</AppWorkerCount> caused WebRTC to fail to connect (via ssl). If we can reproduce it we'll start a new ticket

We're also working on a benchmark for ffmpeg vs OME thumbs, will report back

naanlizard commented 2 years ago

Finally got around to benchmarks

Quick and dirty first data collection shows OME using ~5x the CPU usage of NGINX when generating thumbnails (at best)

image

This is with this nginx config https://pastebin.com/raw/vsmeCRVd

and this OME config https://pastebin.com/raw/NsEun9Q7

Note that the resolution of the generated scaled images are slightly different I believe, but that shouldn't matter. OME is also generating thumbnails less frequently than the main image.

We can share a docker image with a little work for nginx-rtmp if that is helpful

We will collect more data tomorrow when we don't risk overloading our server with OME generating thumbnails :)

Keukhan commented 2 years ago

@naanlizard

Thanks for sharing the performance comparison results. nginx related data helped me a lot. I am aware of the problem of using a lot of CPU unnecessarily in the thumbnail generation process. However, I'm working on another major feature right now. So, I will review this issue as soon as possible.

If you have any ideas for optimization in the process of generating thumbnails in OME, please suggest them to me. :)

Thanks

naanlizard commented 2 years ago

I'm not familiar with how OME generates screenshots and thumbnails. Perhaps you are reading too many frames? I believe nginx-rtmp simply waits for a keyframe and uses that as the screenshot.

getroot commented 2 years ago

@naanlizard Yes, that is the main cause of the performance difference. Since OME supports FPS even in thumbnails, it reads and decodes every frame. (There are users for whom this is a requirement). In some cases, of course, there are ways to improve further. Kwon will do it in the future.

naanlizard commented 2 years ago

Good to hear it is a straightforward change. I'll continue CPU usage comparisons

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

naanlizard commented 1 year ago

I am excited for a more efficient OME thumbnail generation, it would be extremely convenient!