BlockySurvival / issue-tracker

A non-code repo for tracking issues w/ the Blocky Survival minetest server
12 stars 0 forks source link

Fast memory leak #53

Open fluxionary opened 5 years ago

fluxionary commented 5 years ago

BlS is leaking memory, which has so far resulted in taking out the entire server instance (not just BlS) at least once.

  1. We need to track down the memory leak, though this is exceedingly difficult.
  2. We need to enforce some way of limiting the amount of memory BlS allocates, so that it does not affect the performance of the other games running on the same server.

Item 2 can be achieved by using ulimit or something to the same effect. The following (old) links have descriptions of several solutions:

fluxionary commented 5 years ago

I talked to Terumoc, and he had some insight into this. He told me to run collectgarbage("count") to get a measure of how much memory Lua is using. This has shown that the issue is not a mod - lua memory usage remains around 120-150mb no matter how much memory the server is chewing up (there's a button in the Admin HQ).

He also reported that this is apparently a known issue in minetest 5.0+, having something to do with pathing for entities. I haven't found the reference to that issue yet, but I'll put it here when I can.

fluxionary commented 5 years ago

note: watchdog mod which reboots the server if it starts lagging too much https://git.rudin.io/minetest/watchdog/src/master/init.lua

fluxionary commented 5 years ago

fancy-schmansy monitoring mod https://github.com/thomasrudin-mt/monitoring

ExeVirus commented 5 years ago

I'll see what I can find in the core game on entity pathing. Perhaps I can help fix the known issue.

fluxionary commented 5 years ago

I've found that this might be caused by our use of a flat-file for our authentication DB: https://github.com/minetest/minetest/pull/7279 Note that updating is already #33

@Billy-S : At your convenience, could you migrate the auth file to SQLite? I'd do it myself, but I can't shutdown the server without it starting up again immediately...

fluxionary commented 5 years ago

And as a further note, while the nightly reboots are annoying, we can get by for another couple weeks if you're busy.

ragulanramkumar commented 5 years ago

Can a back up be made today? I can migrate the database, but I don't want to run the risk of it going wrong

fluxionary commented 5 years ago

@xerox123official Backup of what? I've discovered i can use the ".backup" sqlite command to backup-up most of the DBs while the server is running, but I can't do that for the main map, because it's constantly locked. Backing up the Auth file should be quite easy, though, since it's a flat file.

While we're on the topic, it'd be great if we had a regular backup procedure again..

ragulanramkumar commented 5 years ago

To stop the server from starting up again just kill the restart script, it's called something like start_mt.sh, then shutdown the server. When you want to restart the server just run that script and fork it to the background with ./path/to/script.sh &

fluxionary commented 5 years ago

@xerox123official Luk gave me sudo access to run Billy's backup script, which I think does all that stuff itself, correct? I plan on running it around 5AM UTC when the server is quietest. I'll post a note when I do.

ragulanramkumar commented 5 years ago

Yup, you can schedule it in Billy's crontab or something

luk3yx commented 5 years ago

To be clear: They can only use sudo to run the script, nothing else.

fluxionary commented 5 years ago

The migration went well, we'll have to check back in a day to see if memory is still leaking though.

krypticbit commented 5 years ago

I'll see if I can change the backup script so that it doesn't require sudo (I didn't think it did)

krypticbit commented 5 years ago

It seems that the latest backup is from July 17 (there are three of them from that data actually) so at least the script works when run manually

krypticbit commented 5 years ago

Hmm, it appears that I can't change the script to not need sudo; the script needs to be run as me so that minetest runs under my name. If it doesn't run under my name, it won't work next time. For now, sudo -u billys <command> should be fine.

fluxionary commented 5 years ago

The memory leak has not gone away, and if anything, it is worse than before. The server had chewed up 20GB of memory in 24 hours when it was rebooted.

I've created a topic on the minetest forums to solicit help: https://forum.minetest.net/viewtopic.php?f=6&t=22882

ExeVirus commented 5 years ago

Something to note, I run many of the mods of this server on my family's local server for the game, and I have never once had a memory leak issue.

My guess then is that the build for Linux (I'm running windows) may have a memory leak or one of the dependencies introduces a memory leak.

I will take it upon myself to to collect all the mods from the bls_mods page (are there others beside lasers I should be aware of) and put them into a test client (5.1) and run that on both windows (7) and Debian 18 LTS.

I'm unsure if ill find anything, but if I don't find anything then the only code this could be coming from would be the net code, which is something I cannot test myself easily.

fluxionary commented 5 years ago

No need to "collect" all the mods - clone the bls_mods repo, then run "git submodule update --recursive --init". Thanks for your effort, but I don't think you'll find anything if the server isn't heavily using all the available mods. I've never noticed the leak in my local clone of the server, but (1) I don't have the BlS map (2) I never run the server for all that long...

Also note, that recently LS-Wonderland, on the same host but running minetest 0.4.17.1, has been experiencing a memory leak as well, though it is much slower than ours.

niwla23 commented 5 years ago

HAve you ever tried to switch the engine? So recompiling everything? Or what happens when using a another map?

niwla23 commented 5 years ago

also, when its at spawn why not just switching to new spawn? It looks finsished

fluxionary commented 5 years ago

@Niwla23 The issue has followed us through at least a couple upgrades to the minetest engine. I've tried replicating the issue on my local world, but I've had no luck. Switching to new spawn will have exactly no impact on this issue.

fluxionary commented 5 years ago

update: 1) This seems to really ramp up when more players who run large techpack/terumet factories are active. We need to investigate why. 2) @Billy-S we could still mitigate the issue through use of ulimit. We should make sure our software sucking up memory shouldn't crash other software running on the same hardware. (see early links to stackexchange for instructions). I'd suggest limiting its memory usage to 8gb.

fluxionary commented 5 years ago

Also, the server LS-Wonderland, also hosted on the same machine, seems to have memory-leak issues as well. However, they are much slower, and require much less frequent reboots. Their weekly reboot cycle has taken care of this all except once. This is despite LS-Wonderland being a creative server w/ few mods in common w/ blocky.

thomasrudin commented 5 years ago

@fluxionary i asked you this on the forums: is there a way you can use the same engine (minetestserver) and run this locally with your modpack?

That way you can run valgrind and the other fancy analysis tools without interrupting the main world. If there are memory leaks they should be detectable even in in singleplayer.. https://stackoverflow.com/questions/5134891/how-do-i-use-valgrind-to-find-memory-leaks

note: watchdog mod which reboots the server if it starts lagging too much https://git.rudin.io/minetest/watchdog/src/master/init.lua

This does not help your issue, that was a problem with the pathfinder.

fancy-schmansy monitoring mod https://github.com/thomasrudin-mt/monitoring

This may help visualize things but i don't think the problem is in the lua-code...

EDIT: how do you compile the mintest-code? A closer look into this might provide some insights..

fluxionary commented 5 years ago

@thomasrudin My earlier attempt to run valgrind was fruitless, because (1) it slowed the game down too much to do anything and (2) my local server doesn't get the usage blocky does - I don't have a ton of players w/ large factories or large buildings, and the leak only really shows up when the server's been busy for a long time. I've got some ideas on how to get around those issues, but I've been doing other stuff. @luk3yx Can you answer the question about how the code was compiled?

fluxionary commented 5 years ago

I'm fairly certain that the issue is, at least in part, related to players running large factories, or having tons of stocked smartshops. Since Futureismine left the server, the memory leak has been much less pronounced (e.g. we can run a few days now without having to reboot...)

luk3yx commented 5 years ago

I have forgotten the exact cmake options I used, however it was something similar to this:

$ cmake . -DRUN_IN_PLACE=FALSE -DBUILD_CLIENT=FALSE -DBUILD_SERVER=TRUE
-- *** Will build version 5.1.0-dev ***
-- Using GMP provided by system.
-- Using bundled JSONCPP library.
-- Using LuaJIT provided by system.
-- cURL support enabled.
-- GetText enabled; locales found: be;ca;cs;da;de;dv;eo;es;et;fr;he;hu;id;it;ja;jbo;kk;kn;ko;ky;lt;ms;nb;nl;pl;pt;pt_BR;ro;ru;sl;sr_Cyrl;sv;sw;tr;uk;zh_CN;zh_TW
-- Freetype enabled.
-- ncurses console enabled.
-- PostgreSQL backend enabled
-- PostgreSQL includes: /usr/include/postgresql;/usr/include/postgresql/10/server
-- LevelDB backend enabled.
-- Redis backend enabled.
-- SpatialIndex not found!
-- Locale blacklist applied; Locales used: ca;cs;da;de;dv;eo;es;et;fr;hu;id;it;ja;jbo;kk;kn;lt;ms;nb;nl;pl;pt;pt_BR;ro;ru;sl;sr_Cyrl;sv;sw;tr;uk
-- Configuring done
-- Generating done
-- Build files have been written to: /[...]/minetest5

Although LevelDB and PostgreSQL support is enabled, the server uses map.sqlite.

fluxionary commented 4 years ago

I haven't monitored this whatsoever since the move to multicraft; I'm curious to see if anything's changed.

fluxionary commented 4 years ago

yup, it's still a thing. about 9.5* as much memory as the next largest server

top - 23:44:23 up 276 days,  8:17,  3 users,  load average: 0.71, 0.71, 0.78
Tasks: 300 total,   1 running, 298 sleeping,   1 stopped,   0 zombie
%Cpu(s):  9.2 us,  0.6 sy,  0.0 ni, 90.0 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :  32182.8 total,    260.6 free,  20330.5 used,  11591.7 buff/cache
MiB Swap:   1533.0 total,      0.0 free,   1533.0 used.   3078.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                         
2992953 billys    20   0   13.9g  13.5g  11372 S  56.8  42.8   1232:14 multicraftserve   
fluxionary commented 2 years ago

recent data:

29790 billys    20   0   20.6g  20.1g  49308 S  53.0  15.9   4108:20 minetestserver                                                    
30472 noah      20   0   10.9g  10.3g  20296 S  21.9   8.2   1523:49 multicraftserve                                                   
13600 noah      20   0 4137868   2.4g  28512 S  39.1   1.9 657:11.68 minetestserver                                                    
24666 srinivas  20   0 4224976   2.3g  11272 S  10.9   1.8  27663:00 minetestserver                                                    
19284 ivan      20   0 2695100   2.0g  17284 S  64.6   1.6 123:57.73 multicraftserve                                                   
 2436 prismo    20   0 2532124   1.9g  11032 S  45.0   1.5   1779:54 multicraftserve                                                   
24656 prismo    20   0 2452060   1.9g  17100 S  79.1   1.5 161:49.40 multicraftserve                                                   
20120 trainta+  20   0 3792680   1.9g  24024 S  34.4   1.5   5462:34 minetestserver                                                    
15691 pteroda+  20   0 2347576   1.7g   6548 S  55.6   1.4 832:45.12 multicraftserve                                                   
14715 medic     20   0 3169040   1.5g  83928 S   9.9   1.2 769:58.06 minetestserver                                                    
31774 1hit      20   0 2055988   1.5g   8224 S   1.3   1.2 448:11.16 multicraftserve                                                   
28870 pteroda+  20   0 1922180   1.3g   6436 S   2.0   1.0 770:45.23 multicraftserve                                                   
 1355 kiwi      20   0 1745572   1.2g  50828 S  21.5   0.9  31:41.34 multicraftserve                                                   
10388 pteroda+  20   0 1765456   1.2g   8076 S   5.0   0.9  68:35.63 minetestserver                                                    
16579 billys    20   0 1466976   1.1g   7804 S   0.7   0.9  45:13.00 minetestserver                                                    
17404 cg        20   0 2968960   1.0g   7628 S   1.3   0.8 797:43.55 multicraftserve                                                   
12378 cora      20   0 1391004 997956   5556 S   0.7   0.8 135:54.42 minetest                                                          
 3679 pteroda+  20   0 1531820 975840   6256 S   1.3   0.7 978:06.56 multicraftserve                                                   
14744 kako      20   0 1415400 932648  16220 S  19.9   0.7  91:41.86 multicraftserve                                                   
16087 santy     20   0 2712036 921724  72796 S  14.6   0.7 308:25.68 multicraftserve  

no clear reasons why the bls server process is restarted in the logs. this is absolutely still an issue