DaemonEngine / Daemon

The Dæmon game engine. With some bits of ioq3 and XreaL.
https://unvanquished.net
BSD 3-Clause "New" or "Revised" License
305 stars 60 forks source link

memory leak #798

Closed ghost closed 9 months ago

ghost commented 1 year ago

Since the release, the game no longer frees lot of memory (more than 100MB on plat23) when a game ends, ultimately leading to crashes for an allocation failure.

On game's menu screen:

  RSS    VSZ COMMAND
 1724  14016 crash_server
209164 1356584 daemon
69724 88153672 nacl_helper_boo

Loading thunder:

  RSS    VSZ COMMAND
 1724  14016 crash_server
645020 1968716 daemon

Re-loading thunder:

  RSS    VSZ COMMAND
 1724  14016 crash_server
803548 2125344 daemon

One more time:

RSS    VSZ COMMAND
 1724  14016 crash_server
943952 2265668 daemon

Restarting game:

 1724  14016 crash_server
205904 1355984 daemon
69500 88153672 nacl_helper_boo

Loading anthill:

  RSS    VSZ COMMAND
 1724  14016 crash_server
652864 1918148 daemon

Re-loading it:

  RSS    VSZ COMMAND
 1724  14016 crash_server
760608 2035192 daemon

When trying whith plat23, I had similar RAM increase over time, so I do not think it's related to the map itself. Even test-doors increase RAM usage by similar amount.

@cu-kai found this (I only was wondering why game was more crashy than before, suspecting another driver issue or something along those lines).

When it runs for long enough to saturate memory, here's the error message one gets:

/map test-doors  
==== ShutdownGame ==== 
----- Server Initialization ----- 
Map: test-doors 
[FS] Loading pak 'Unvanquished/pkg/unvanquished_0.54.0.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/tex-common_2.3.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-players_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-legacy_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-weapons_0.54-dirty.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-buildables_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-voices_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-soundtrack_0.54.dpk'... 
[FS] Loading pak 'unvanquished/pkg/map-test-doors_0.1.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/tex-space_1.3.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/tex-trak5_1.3.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/tex-pk02_1.3.dpk'... 
Extracting VM module sgame-amd64.nexe from Unvanquished/pkg/unvanquished_0.54.0.dpk...

Loading VM module sgame-amd64.nexe... 
Using loader args:  Unvanquished/nacl_helper_bootstrap Unvanquished/nacl_loader --r_debug=0xXXXXXXXXXXXXXXXX --reserved_at_zero=0xXXXXXXXXXXXXXXXX -v -B Unvanquished/irt_core-amd64.nexe -e -i 100:35 -- .local/unvanquished/sgame-amd64.nexe 100 
Warn: IPC: Socket closed by remote end 
[FS] Loading pak 'Unvanquished/pkg/unvanquished_0.54.0.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/tex-common_2.3.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-players_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-legacy_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-weapons_0.54-dirty.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-buildables_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-voices_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-soundtrack_0.54.dpk'... 
----- Server Shutdown ----- 
Error during sgame shutdown: IPC: Failed to send message: Broken pipe 
Warn: VM exited with non-zero exit code 250

IP: 127.0.0.1 
IP: 192.168.1.102 
IP6: ::1 
IP6: fd24:fb65:397b:3900:7285:c2ff:fea6:e36 
IP6: fe80::7285:c2ff:fea6:e36%eth1 
Opening IP6 socket: [::]:* 
Opening IP socket: 0.0.0.0:* 
---------------------------

[FS] Loading pak 'Unvanquished/pkg/unvanquished_0.54.0.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/tex-common_2.3.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-players_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-legacy_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-weapons_0.54-dirty.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-buildables_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-voices_0.54.dpk'... 
[FS] Loading pak 'Unvanquished/pkg/res-soundtrack_0.54.dpk'... 
Extracting VM module cgame-amd64.nexe from Unvanquished/pkg/unvanquished_0.54.0.dpk...

Loading VM module cgame-amd64.nexe... 
Using loader args:  Unvanquished/nacl_helper_bootstrap Unvanquished/nacl_loader --r_debug=0xXXXXXXXXXXXXXXXX --reserved_at_zero=0xXXXXXXXXXXXXXXXX -v -B Unvanquished/irt_core-amd64.nexe -e -i 100:30 -- .local/unvanquished/cgame-amd64.nexe 100 
Loaded VM module in 88 msec 
Warn: Language "French (France)" (fr_FR) not found. Default to "English" (en) 
Loading Lua plugin using a new Lua state. 
Loaded font face 'unifont' [regular] from 'fonts/unifont.ttf'. 
Loaded font face 'Roboto' [bold] from 'fonts/Roboto-Bold.ttf'. 
Loaded font face 'Roboto' [italic, bold] from 'fonts/Roboto-BoldItalic.ttf'. 
Loaded font face 'Roboto' [italic] from 'fonts/Roboto-Italic.ttf'. 
Loaded font face 'Roboto' [regular] from 'fonts/Roboto-Regular.ttf'. 
Loaded font face 'FontAwesome' [regular] from 'fonts/fontawesome-webfont.ttf'. 
Loaded font face 'DejaVu Sans' [regular] from 'fonts/DejaVuSansCondensed.ttf'. 
Loaded font face 'Material Icons' [regular] from 'fonts/MaterialIcons-Regular.ttf'. 
Calling GetRefAPI… 
GL_VENDOR: Intel 
GL_RENDERER: Mesa Intel(R) UHD Graphics 630 (CFL GT2) 
GL_VERSION: 4.6 (Core Profile) Mesa 20.3.5 
GL_MAX_TEXTURE_SIZE: 16384 
GL_SHADING_LANGUAGE_VERSION: 4.60 
GL_MAX_VERTEX_UNIFORM_COMPONENTS 16384 
GL_MAX_VERTEX_ATTRIBS 16 
Occlusion query bits: 64 
GL_MAX_DRAW_BUFFERS: 8 
GL_TEXTURE_MAX_ANISOTROPY_EXT: 16.000000 
GL_MAX_RENDERBUFFER_SIZE: 16384 
GL_MAX_COLOR_ATTACHMENTS: 8 
PIXELFORMAT: color(24-bits) 
MODE: -1, 1366 x 768 fullscreen hz: N/A 
Using OpenGL version 4.6, requested: 3.2 
Using OpenGL 3.x context. 
Using an OpenGL core profile. 
OpenGL 3.x context is not forward compatible. 
Using OpenGL extensions: GL_ARB_half_float_pixel GL_ARB_texture_float GL_EXT_gpu_shader4 GL_EXT_texture_integer GL_ARB_texture_rg GL_ARB_texture_gather GL_EXT_texture_compression_s3tc GL_ARB_texture_compression_rgtc GL_EXT_texture_filter_anisotropic GL_ARB_half_float_vertex GL_ARB_framebuffer_object GL_ARB_get_program_binary GL_ARB_buffer_storage GL_ARB_uniform_buffer_object GL_ARB_map_buffer_range GL_ARB_sync 
Using S3TC (DXTC) texture compression. 
Using GPU vertex skinning with max 256 bones in a single pass, models are hardware accelerated. 
Extracting VM module cgame-amd64.nexe from Unvanquished/pkg/unvanquished_0.54.0.dpk...

Loading VM module cgame-amd64.nexe... 
Using loader args:  Unvanquished/nacl_helper_bootstrap Unvanquished/nacl_loader --r_debug=0xXXXXXXXXXXXXXXXX --reserved_at_zero=0xXXXXXXXXXXXXXXXX -v -B Unvanquished/irt_core-amd64.nexe -e -i 100:31 -- .local/unvanquished/cgame-amd64.nexe 100 
Warn: Error during error handling: IPC: Socket closed by remote end 
]
ghost commented 1 year ago

As a side note, it would be nice to have real error messages, here, it would be nice to be said "out of memory": I would have reported this long ago with such a message. It would also be nice to not display full file path in the logs, instead saying once where the homepath and pakpath are, and then using, dunno, $HOMEPATH and $PATHPAK as strings? It would make the logs less noisy and easier to remove info about how filesystem is organised locally.

illwieckz commented 1 year ago

As a side note, it would be nice to have real error messages, here.

What suffers from OOM is likely nacl_loader here, so unless we have a way to produce more logs from nacl_loader, there is not much things we can do. Like, it's not our software that got the OOM. There are ways to get more logs from nacl_loader though, but I don't know if OOM is part of them.

illwieckz commented 1 year ago

By running daemon-tty over valgrind and switching to some maps in a row, it only reports less than a kbytes wasted.

By doing the same with daemon, almost all the leakage it reports comes from the OpenGL driver on my end.

So, either I don't reproduce the bug, either we don't free something allocated by the OpenGL driver, if we have too do it ourselves.

illwieckz commented 1 year ago

I noticed that doing disconnect while being on a map also increases memory like when loading a new map.

I also noticed loading a new map from main menu while the same map was loaded before (but leaved with /disconnect) doesn't increase memory.

Loading the same map in a row increases memory.

So it's like some loaded data from map is kept and reused if loading the same map again, but quitting a map always loads some data (going from main menu, or going to the same map again).

illwieckz commented 1 year ago

Actually the size of loaded images is similar to the leak, I don't know if that's related.

illwieckz commented 1 year ago

I don't know if pic buffer passed to re.GenerateTexture is freed.

ghost commented 1 year ago

Some bisecting results when trying to hunt this:

auto-end: 1ff073fa2, leaks 39
auto-end: 25cb8c416, leaks 42
no-run: 6745fd943  43  
no-run: 0d77c2135  55  
no-run: 8086bc0f8 129 
no-run: 8a51faa5e 131 
no-run: 966b58afb 138 
no-run: 681fb35c1 139 
no-run: a60858529 152 

python-fail: 73c5cd148 275 
python-fail: 7bc60ee34 292 

The end number is the distance related to aa0df65ca, the last commit I have in common with you, IIRC. Note that it's not important as I have not changed daemon's code, and the problem is clearly on daemon-side.

So, about the 1st part of those lines:

Those are all unvanquished commit references. I do not intend to work more on this regression for the foreseeable future.

ghost commented 1 year ago

I gave a shot today with my local "RelWithDebInfo" build, which includes ~10 months of changes plus some local personal ones (warning fixes mostly, nothing important):

% ps --no-headers -orss,vsz,comm $(pidof daemon )
249688 1512248 daemon <- cold boot
619316 2112704 daemon <- 1st /map plat23
745252 2226736 daemon <- 2nd /map plat23
783184 2257284 daemon <- 3rd /map plat23
779296 2252868 daemon <- 4th /map plat23
791592 2265984 daemon <- 5
789544 2264700 daemon <- 6

As the results obviously show, the big leak of 150-200 megs is gone for me. Or rather, will be when the release will be done.