Open fluxionary opened 5 years ago
I talked to Terumoc, and he had some insight into this. He told me to run collectgarbage("count")
to get a measure of how much memory Lua is using. This has shown that the issue is not a mod - lua memory usage remains around 120-150mb no matter how much memory the server is chewing up (there's a button in the Admin HQ).
He also reported that this is apparently a known issue in minetest 5.0+, having something to do with pathing for entities. I haven't found the reference to that issue yet, but I'll put it here when I can.
note: watchdog mod which reboots the server if it starts lagging too much https://git.rudin.io/minetest/watchdog/src/master/init.lua
fancy-schmansy monitoring mod https://github.com/thomasrudin-mt/monitoring
I'll see what I can find in the core game on entity pathing. Perhaps I can help fix the known issue.
I've found that this might be caused by our use of a flat-file for our authentication DB: https://github.com/minetest/minetest/pull/7279 Note that updating is already #33
@Billy-S : At your convenience, could you migrate the auth file to SQLite? I'd do it myself, but I can't shutdown the server without it starting up again immediately...
And as a further note, while the nightly reboots are annoying, we can get by for another couple weeks if you're busy.
Can a back up be made today? I can migrate the database, but I don't want to run the risk of it going wrong
@xerox123official Backup of what? I've discovered i can use the ".backup" sqlite command to backup-up most of the DBs while the server is running, but I can't do that for the main map, because it's constantly locked. Backing up the Auth file should be quite easy, though, since it's a flat file.
While we're on the topic, it'd be great if we had a regular backup procedure again..
To stop the server from starting up again just kill the restart script, it's called something like start_mt.sh, then shutdown the server. When you want to restart the server just run that script and fork it to the background with ./path/to/script.sh &
@xerox123official Luk gave me sudo access to run Billy's backup script, which I think does all that stuff itself, correct? I plan on running it around 5AM UTC when the server is quietest. I'll post a note when I do.
Yup, you can schedule it in Billy's crontab or something
To be clear: They can only use sudo
to run the script, nothing else.
The migration went well, we'll have to check back in a day to see if memory is still leaking though.
I'll see if I can change the backup script so that it doesn't require sudo
(I didn't think it did)
It seems that the latest backup is from July 17 (there are three of them from that data actually) so at least the script works when run manually
Hmm, it appears that I can't change the script to not need sudo; the script needs to be run as me so that minetest runs under my name. If it doesn't run under my name, it won't work next time. For now, sudo -u billys <command>
should be fine.
The memory leak has not gone away, and if anything, it is worse than before. The server had chewed up 20GB of memory in 24 hours when it was rebooted.
I've created a topic on the minetest forums to solicit help: https://forum.minetest.net/viewtopic.php?f=6&t=22882
Something to note, I run many of the mods of this server on my family's local server for the game, and I have never once had a memory leak issue.
My guess then is that the build for Linux (I'm running windows) may have a memory leak or one of the dependencies introduces a memory leak.
I will take it upon myself to to collect all the mods from the bls_mods page (are there others beside lasers I should be aware of) and put them into a test client (5.1) and run that on both windows (7) and Debian 18 LTS.
I'm unsure if ill find anything, but if I don't find anything then the only code this could be coming from would be the net code, which is something I cannot test myself easily.
No need to "collect" all the mods - clone the bls_mods repo, then run "git submodule update --recursive --init". Thanks for your effort, but I don't think you'll find anything if the server isn't heavily using all the available mods. I've never noticed the leak in my local clone of the server, but (1) I don't have the BlS map (2) I never run the server for all that long...
Also note, that recently LS-Wonderland, on the same host but running minetest 0.4.17.1, has been experiencing a memory leak as well, though it is much slower than ours.
HAve you ever tried to switch the engine? So recompiling everything? Or what happens when using a another map?
also, when its at spawn why not just switching to new spawn? It looks finsished
@Niwla23 The issue has followed us through at least a couple upgrades to the minetest engine. I've tried replicating the issue on my local world, but I've had no luck. Switching to new spawn will have exactly no impact on this issue.
update: 1) This seems to really ramp up when more players who run large techpack/terumet factories are active. We need to investigate why. 2) @Billy-S we could still mitigate the issue through use of ulimit. We should make sure our software sucking up memory shouldn't crash other software running on the same hardware. (see early links to stackexchange for instructions). I'd suggest limiting its memory usage to 8gb.
Also, the server LS-Wonderland, also hosted on the same machine, seems to have memory-leak issues as well. However, they are much slower, and require much less frequent reboots. Their weekly reboot cycle has taken care of this all except once. This is despite LS-Wonderland being a creative server w/ few mods in common w/ blocky.
@fluxionary i asked you this on the forums: is there a way you can use the same engine (minetestserver) and run this locally with your modpack?
That way you can run valgrind
and the other fancy analysis tools without interrupting the main world.
If there are memory leaks they should be detectable even in in singleplayer..
https://stackoverflow.com/questions/5134891/how-do-i-use-valgrind-to-find-memory-leaks
note: watchdog mod which reboots the server if it starts lagging too much https://git.rudin.io/minetest/watchdog/src/master/init.lua
This does not help your issue, that was a problem with the pathfinder.
fancy-schmansy monitoring mod https://github.com/thomasrudin-mt/monitoring
This may help visualize things but i don't think the problem is in the lua-code...
EDIT: how do you compile the mintest-code? A closer look into this might provide some insights..
@thomasrudin My earlier attempt to run valgrind was fruitless, because (1) it slowed the game down too much to do anything and (2) my local server doesn't get the usage blocky does - I don't have a ton of players w/ large factories or large buildings, and the leak only really shows up when the server's been busy for a long time. I've got some ideas on how to get around those issues, but I've been doing other stuff. @luk3yx Can you answer the question about how the code was compiled?
I'm fairly certain that the issue is, at least in part, related to players running large factories, or having tons of stocked smartshops. Since Futureismine left the server, the memory leak has been much less pronounced (e.g. we can run a few days now without having to reboot...)
I have forgotten the exact cmake
options I used, however it was something similar to this:
$ cmake . -DRUN_IN_PLACE=FALSE -DBUILD_CLIENT=FALSE -DBUILD_SERVER=TRUE
-- *** Will build version 5.1.0-dev ***
-- Using GMP provided by system.
-- Using bundled JSONCPP library.
-- Using LuaJIT provided by system.
-- cURL support enabled.
-- GetText enabled; locales found: be;ca;cs;da;de;dv;eo;es;et;fr;he;hu;id;it;ja;jbo;kk;kn;ko;ky;lt;ms;nb;nl;pl;pt;pt_BR;ro;ru;sl;sr_Cyrl;sv;sw;tr;uk;zh_CN;zh_TW
-- Freetype enabled.
-- ncurses console enabled.
-- PostgreSQL backend enabled
-- PostgreSQL includes: /usr/include/postgresql;/usr/include/postgresql/10/server
-- LevelDB backend enabled.
-- Redis backend enabled.
-- SpatialIndex not found!
-- Locale blacklist applied; Locales used: ca;cs;da;de;dv;eo;es;et;fr;hu;id;it;ja;jbo;kk;kn;lt;ms;nb;nl;pl;pt;pt_BR;ro;ru;sl;sr_Cyrl;sv;sw;tr;uk
-- Configuring done
-- Generating done
-- Build files have been written to: /[...]/minetest5
Although LevelDB and PostgreSQL support is enabled, the server uses map.sqlite
.
I haven't monitored this whatsoever since the move to multicraft; I'm curious to see if anything's changed.
yup, it's still a thing. about 9.5* as much memory as the next largest server
top - 23:44:23 up 276 days, 8:17, 3 users, load average: 0.71, 0.71, 0.78
Tasks: 300 total, 1 running, 298 sleeping, 1 stopped, 0 zombie
%Cpu(s): 9.2 us, 0.6 sy, 0.0 ni, 90.0 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 32182.8 total, 260.6 free, 20330.5 used, 11591.7 buff/cache
MiB Swap: 1533.0 total, 0.0 free, 1533.0 used. 3078.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2992953 billys 20 0 13.9g 13.5g 11372 S 56.8 42.8 1232:14 multicraftserve
recent data:
29790 billys 20 0 20.6g 20.1g 49308 S 53.0 15.9 4108:20 minetestserver
30472 noah 20 0 10.9g 10.3g 20296 S 21.9 8.2 1523:49 multicraftserve
13600 noah 20 0 4137868 2.4g 28512 S 39.1 1.9 657:11.68 minetestserver
24666 srinivas 20 0 4224976 2.3g 11272 S 10.9 1.8 27663:00 minetestserver
19284 ivan 20 0 2695100 2.0g 17284 S 64.6 1.6 123:57.73 multicraftserve
2436 prismo 20 0 2532124 1.9g 11032 S 45.0 1.5 1779:54 multicraftserve
24656 prismo 20 0 2452060 1.9g 17100 S 79.1 1.5 161:49.40 multicraftserve
20120 trainta+ 20 0 3792680 1.9g 24024 S 34.4 1.5 5462:34 minetestserver
15691 pteroda+ 20 0 2347576 1.7g 6548 S 55.6 1.4 832:45.12 multicraftserve
14715 medic 20 0 3169040 1.5g 83928 S 9.9 1.2 769:58.06 minetestserver
31774 1hit 20 0 2055988 1.5g 8224 S 1.3 1.2 448:11.16 multicraftserve
28870 pteroda+ 20 0 1922180 1.3g 6436 S 2.0 1.0 770:45.23 multicraftserve
1355 kiwi 20 0 1745572 1.2g 50828 S 21.5 0.9 31:41.34 multicraftserve
10388 pteroda+ 20 0 1765456 1.2g 8076 S 5.0 0.9 68:35.63 minetestserver
16579 billys 20 0 1466976 1.1g 7804 S 0.7 0.9 45:13.00 minetestserver
17404 cg 20 0 2968960 1.0g 7628 S 1.3 0.8 797:43.55 multicraftserve
12378 cora 20 0 1391004 997956 5556 S 0.7 0.8 135:54.42 minetest
3679 pteroda+ 20 0 1531820 975840 6256 S 1.3 0.7 978:06.56 multicraftserve
14744 kako 20 0 1415400 932648 16220 S 19.9 0.7 91:41.86 multicraftserve
16087 santy 20 0 2712036 921724 72796 S 14.6 0.7 308:25.68 multicraftserve
no clear reasons why the bls server process is restarted in the logs. this is absolutely still an issue
BlS is leaking memory, which has so far resulted in taking out the entire server instance (not just BlS) at least once.
Item 2 can be achieved by using ulimit or something to the same effect. The following (old) links have descriptions of several solutions: