CasparCG / server

CasparCG Server is a Windows and Linux software used to play out professional graphics, audio and video to multiple outputs. It has been in 24/7 broadcast production since 2006. Ready-to-use downloads are available under the Releases tab https://casparcg.com.
GNU General Public License v3.0
913 stars 268 forks source link

GPU memory leak on HTML producer #1265

Closed rrebuffo closed 11 months ago

rrebuffo commented 4 years ago

Expected behaviour

VRAM usage should be freed upon closing of HTML templates or HTML producers.

Current behaviour

GPU memory usage keeps increasing when removing and adding HTML templates until VRAM is full and starts using shared video memory, slowing down rendering.


Steps to reproduce

  1. Specify <html><enable-gpu>true</enable-gpu></html> in config
  2. CG 1-1 ADD 0 "template1" 1 "data"
  3. CG 1-1 STOP
  4. Repeat steps 2 and 3 indefinitely.
  5. Watch casparcg.exe GPU memory usage go up in task manager.

Environment


Screenshots

Timelapse: https://img.mehir.ar/template.gif

dotarmin commented 4 years ago

Hi @rrebuffo, thanks for reporting this. I'm just curious how long time it takes until VRAM is full with your hardware? How many iterations of add/stop do you do before VRAM is full?

Best regards, Armin

rrebuffo commented 4 years ago

Every template added increases around 16MB of VRAM. It gets filled every few hours when developing templates (testing small changes more than a houndred times). The system gets really slow when GPU total RAM usage is around 25% above dedicated GPU memory (my card is 2GB VRAM, it gets slower approaching the 1900MB and becomes laggy system-wide above 2600MB usage, clearly falling back to shared memory.

dotarmin commented 4 years ago

@rrebuffo, thanks for the additional information. I have put the issue into the v2.3.0 LTS milestone.

dimitry-ishenko commented 4 years ago

@dotarmin is there an announcement somewhere about the LTS version? (sorry for hijacking the issue)

dotarmin commented 4 years ago

@dimitry-ishenko, it will be announced very soon :)

sirfnomi commented 4 years ago

I'm unable to reproduce this issue. may be it is template related or try to update graphics driver etc ?

dotarmin commented 4 years ago

@rrebuffo, can you provide us with the template you're using to reproduce this error?

rrebuffo commented 4 years ago

Any template would do. I just checked with PLAY 1-1 [HTML] google.com Same problem. Same 16MB increases. I can also point out that this happens in two different machines with different OS, driver versions and hardware.

rrebuffo commented 4 years ago

On a fresh copy of the last build and edited casparcg.config with just <html><enable-gpu>true</enable-gpu></html> added and run it from casparcg_auto_restart.bat:

[2020-04-11 05:28:47.679] [info]    ############################################################################
[2020-04-11 05:28:47.680] [info]    CasparCG Server is distributed by the Swedish Broadcasting Corporation (SVT)
[2020-04-11 05:28:47.680] [info]    under the GNU General Public License GPLv3 or higher.
[2020-04-11 05:28:47.680] [info]    Please see LICENSE.TXT for details.
[2020-04-11 05:28:47.680] [info]    http://www.casparcg.com/
[2020-04-11 05:28:47.680] [info]    ############################################################################
[2020-04-11 05:28:47.680] [info]    Starting CasparCG Video and Graphics Playout Server 2.3.0 4176a9b1 Dev
[2020-04-11 05:28:48.270] [info]    Initializing OpenGL Device.
[2020-04-11 05:28:48.279] [info]    Initialized OpenGL 4.5.0 NVIDIA 441.66 NVIDIA Corporation
[2020-04-11 05:28:48.350] [info]    D3D11: Selected adapter: NVIDIA GeForce GTX 960
[2020-04-11 05:28:48.350] [info]    D3D11: Selected feature level: 45312
[2020-04-11 05:28:48.359] [info]    Initialized ffmpeg module.
[2020-04-11 05:28:48.359] [info]    Initialized oal module.
[2020-04-11 05:28:48.360] [info]    Initialized decklink module.
[2020-04-11 05:28:48.360] [info]    Initialized screen module.
[2020-04-11 05:28:48.360] [info]    Initialized newtek module.
[2020-04-11 05:28:48.412] [info]    Initialized html module.
[2020-04-11 05:28:48.685] [info]    Initialized flash module.
[2020-04-11 05:28:48.687] [info]    Initialized bluefish module.
[2020-04-11 05:28:48.687] [info]    Initialized image module.
[2020-04-11 05:28:48.687] [info]    "C:/CasparCG\server_2.3_4176a9b1\casparcg.config":
[2020-04-11 05:28:48.687] [info]    -----------------------------------------
[2020-04-11 05:28:48.687] [info]    <?xml version="1.0" encoding="utf-8"?>
[2020-04-11 05:28:48.687] [info]    <configuration>
[2020-04-11 05:28:48.687] [info]       <paths>
[2020-04-11 05:28:48.687] [info]          <media-path>media/</media-path>
[2020-04-11 05:28:48.687] [info]          <log-path>log/</log-path>
[2020-04-11 05:28:48.687] [info]          <data-path>data/</data-path>
[2020-04-11 05:28:48.687] [info]          <template-path>template/</template-path>
[2020-04-11 05:28:48.687] [info]       </paths>
[2020-04-11 05:28:48.687] [info]       <lock-clear-phrase>secret</lock-clear-phrase>
[2020-04-11 05:28:48.687] [info]       <channels>
[2020-04-11 05:28:48.687] [info]          <channel>
[2020-04-11 05:28:48.687] [info]             <video-mode>720p5000</video-mode>
[2020-04-11 05:28:48.687] [info]             <consumers>
[2020-04-11 05:28:48.687] [info]                <screen/>
[2020-04-11 05:28:48.687] [info]                <system-audio/>
[2020-04-11 05:28:48.687] [info]             </consumers>
[2020-04-11 05:28:48.687] [info]          </channel>
[2020-04-11 05:28:48.687] [info]       </channels>
[2020-04-11 05:28:48.687] [info]       <controllers>
[2020-04-11 05:28:48.687] [info]          <tcp>
[2020-04-11 05:28:48.687] [info]             <port>5250</port>
[2020-04-11 05:28:48.687] [info]             <protocol>AMCP</protocol>
[2020-04-11 05:28:48.687] [info]          </tcp>
[2020-04-11 05:28:48.687] [info]       </controllers>
[2020-04-11 05:28:48.687] [info]       <amcp>
[2020-04-11 05:28:48.687] [info]          <media-server>
[2020-04-11 05:28:48.687] [info]             <host>localhost</host>
[2020-04-11 05:28:48.687] [info]             <port>8000</port>
[2020-04-11 05:28:48.687] [info]          </media-server>
[2020-04-11 05:28:48.687] [info]       </amcp>
[2020-04-11 05:28:48.687] [info]       <html>
[2020-04-11 05:28:48.687] [info]          <enable-gpu>true</enable-gpu>
[2020-04-11 05:28:48.687] [info]       </html>
[2020-04-11 05:28:48.687] [info]    </configuration>
[2020-04-11 05:28:48.687] [info]    -----------------------------------------
[2020-04-11 05:28:48.707] [info]    Initialized OpenGL Accelerated GPU Image Mixer for channel 1
[2020-04-11 05:28:48.709] [info]    video_channel[1|720p5000] Successfully Initialized.
[2020-04-11 05:28:48.711] [info]    Screen consumer [1|720p5000] Initialized.
[2020-04-11 05:28:48.774] [info]    oal[1|720p5000] Initialized.
[2020-04-11 05:28:48.775] [info]    Initialized channels.
[2020-04-11 05:28:48.776] [info]    Initialized controllers.
[2020-04-11 05:28:48.777] [info]    Initialized osc.
[2020-04-11 05:29:02.698] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:05.306] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:05.515] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:06.306] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:06.614] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:07.178] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:07.474] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:08.082] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:08.394] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:09.194] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:09.474] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:17.290] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:17.594] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:18.322] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:18.574] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:22.738] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:23.054] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:23.882] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:24.094] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:24.994] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:25.294] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:26.074] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:26.374] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:52.874] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:53.194] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:53.314] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:53.594] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:53.738] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:53.954] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:54.130] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:54.434] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:54.522] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:54.794] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:54.890] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:55.154] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:55.250] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:55.534] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:55.634] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:55.914] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:55.962] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:56.294] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:56.314] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:56.594] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:56.674] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:56.974] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:58.018] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:58.374] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:58.506] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:58.814] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:59.010] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:59.294] [info]    html[google.com] Destroyed.
[2020-04-11 05:29:59.514] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:29:59.794] [info]    html[google.com] Destroyed.
[2020-04-11 05:30:00.018] [info]    Received message from Console: PLAY 1-1 [html] google.com\r\n
[2020-04-11 05:30:00.234] [info]    html[google.com] Destroyed.

The final VRAM usage after that is 278.108K

sirfnomi commented 4 years ago

@rrebuffo may be it is now system related because i tried it again with same your config and play 1-1 [HTML] google.com. casparcg is working normal

I'm testing it on Win 10 x64 Dell T5500 Quadro 2000 with latest driver ( on old driver I was having casparcg crashing issue )

dotarmin commented 4 years ago

@rrebuffo can you test with removed? I know we had issues with a memory leak when using system-audio but that should have been fixed as far as I can see.

Can you also test to upgrade your drivers as advised above 😊

Thanks

rrebuffo commented 4 years ago

Tried updating to latest graphics driver with clean settings and also removing system audio, even though I had previously tested it with only one Decklink or one NDI consumer. The only change is that the increments are now 8MB. This is very frustrating, like I said, this happens to two different machines. Can it be the windows installation? The only common thing about them is the installation media. Windows is 18363.418 (I did not update at all)

rrebuffo commented 4 years ago

Tested again on an old Windows 7 test installation and got the same results with 4176a9b1bcd03dd22061f30110b1664d8216c881 build and latest nvidia drivers. Clean config file only modified with GPU enabled on HTML. Playing out PLAY 1-1 [html] google.com Win7 Memory load goes straight up to 100% after around 150 commands.

hreinnbeck commented 4 years ago

When you've gotten to 100%, does a CLEAR 1-1 free the VRAM or not?

rrebuffo commented 4 years ago

No. It sits there until the server is shut down. Neither CLEAR 1-1 nor CLEAR 1 have any effect.

hreinnbeck commented 4 years ago

OK could you see if GL GC clears it?

rrebuffo commented 4 years ago

It goes down a couple MB but it's not freed.

hreinnbeck commented 4 years ago

Alright, I think that suggest this is a problem with CEF not freeing the memory.

Julusian commented 4 years ago

Can you show what GL INFO reports and what the DIAG window looks like?

I need to try and reproduce this myself, and if I cannot I shall be back with a special build designed to try and narrow down the cause

rrebuffo commented 4 years ago

baseline past_100% Diag is not telling much because there's no other load on the system. UWP apps and Visual Studio Code become unusable, a simple scroll takes half a second to render. Also I'm testing with one less monitor (only one 1080p, usually I have this one and a 2160p one) and that helps with performance.

Julusian commented 4 years ago

Is the second screenshot from when it has run out of memory? I was wondering if perhaps we were leaking producers, and that might be shown in diag. And gl info will show the total memory we have still allocated/cached on the gpu (unless we have truely leaked some)

rrebuffo commented 4 years ago

Yes it is. I think the producers are cleaned up and they don’t show up in diag (like the ffmpeg rtmp bug do) but clearly the memory from them is not being released.

TondaKrist commented 3 years ago

Problem appears only if HTML GPU option is turned on in config.

Sidonai-1 commented 3 years ago

Problem appears only if HTML GPU option is turned on in config.

Is it still present in 2.3.2?

TondaKrist commented 3 years ago

Is it still present in 2.3.2?

Yes it is https://github.com/CasparCG/server/issues/1363

tvt-devteam commented 3 years ago

Regarding the following commands mentioned above: GL INFO - Retrieves information about the allocated and pooled OpenGL resources. GL GC - Releases all the pooled OpenGL resources. May cause a pause on all video channels.

Questions I have are:

Documentation mentions “May cause a pause on all video channels.”. Having done some testing I have not seen any pausing in neither audio or video. Neither does this affect the templates loaded.

Julusian commented 3 years ago

@tvt-devteam

What exactly are the buffers seen in the INFO output?

The currently unused opengl buffers. One is used to store each frame for compositing. They get pooled as freeing and allocating buffers is not always cheap, so it is better to pool and reuse them

Even with false how come, the GPU is still utilised? Is it possible to disable OpenGL if it might be partly related to the GPU memory issues?

The gpu-enabled option belongs to the html producer, and whether it will internally use the gpu. OpenGL is also used for compositing all layers on the channel and cannot be disabled. It was possible to disable it in 2.1, but it had limitations and was deemed to add little value so was removed.

With a 24/7 operation in mind is this is good practice or required to run GL GC once in a while?

Only if you are playing odd resolution clips. If all of your source media is the same resolution, then the same buffers will be reused each time, with more being allocated only when it starts to run out. But if you are occasionally playing odd resolutions, then buffers will be allocated and potentially never used again. So a GL GC will help free memory if you need to.

Documentation mentions “May cause a pause on all video channels.”.

It states this because freeing the memory can take some ms depending on the gpu and driver. Additionally, the next time it needs a buffer a new one will be allocated (likely for the next frame after GL GC is run), which again depending on gpu and driver can take some ms. It is recommended to preload videos rather than direct playing to ensure playback is smooth and doesnt stutter due to buffer allocations being slow.

rrebuffo commented 2 years ago

Tested this using the new build from @Julusian with updated CEF 95. This issue is still present.