Barbosik / MultiOgar

An open source Ogar server implementation, written with Node.js.
Other
61 stars 44 forks source link

Auto Restart #217

Open inxlio opened 8 years ago

inxlio commented 8 years ago

After a while, servers accumulate too much RAM , In 2 hours I get to use 12GB, 80 mbps connection

Please make autoRestartTime , thanks.

GREEB commented 8 years ago

Hello

There is a memory leak in the stats read this to fix it.

You could use something like this to make it auto Restart

Start server with (this is important so the server autorestarts):

`

!/bin/bash

while ! node MultiOgar do sleep 1 echo "Restarting MultiOgar..." done ` Save this as whateveryouwant.sh

do chmod +x whateveryouwant.sh

start the script aka MultiOgar ./whateveryouwant.sh or do it in a screen

make the autorestart bash:

#!/usr/bin/perl

use strict;
use warnings;
use Proc::ProcessTable;

my $table = Proc::ProcessTable->new;

for my $process (@{$table->table}) {

    # skip anything other than Passenger application processes
    #next unless $process->fname eq 'ruby' and $process->cmndline =~ /\bRails\b/;

    # skip any using less than 1 GiB
    next if $process->rss < 4_103_741_824;

    # document the slaughter
    (my $cmd = $process->cmndline) =~ s/\s+\z//;
    print "Killing process: pid=", $process->pid, " uid=", $process->uid, " rss=", $process->rss, " fname=", $process->fname, " cmndline=", $cmd, "\n$

    # try first to terminate process politely
    kill 15, $process->pid;

    # wait a little, then kill ruthlessly if it's still around
    sleep 5;
    kill 9, $process->pid;
}

Save this as whateveryouwantbutnotthesame.sh do chmod +x whateveryouwantbutnotthesame.sh

be sure to have perl installed and the Proc::ProcessTable module

and this is really bad if you have other things running on your server cuz it just kills everything over 4gb even root stuff (if you want to skip root stuff add this: next if $process->uid == 0 or $process->gid == 0; )

change this line accordingly next if $process->rss < 4_103_741_824; this would kill every process that uses more then 4 gb.

do crontab -e and add a line to run the autokill script every 20 min or so */20 * * * * /home/multiogar/whateveryouwantbutnotthesame.sh

exit and save.

Not really tested it like this but i'm using a similar method to keep our servers alive.

sauce

Have fun testing this Greeb -Agarlist

@Barbosik i had to make the agarlist servers connect only with our client because of the all bots, just as heads up because you have our servers in the readme. So you may want to delete the direct ip's.

BaumanDev commented 8 years ago

.sh So this method doesn't work with windows?

David1ali12 commented 8 years ago

adding bots can also make memory leak just after adding 70 bots & leaving it for 2 hours will make the server laggy as hell

inxlio commented 8 years ago

windows please

acydwarp commented 8 years ago

@GREEB Killing all active processes above 4GB RAM usage on a machine is a very dangerous practice.

There is a little gem called PM2, a widely known process manager for Node.js. We can use it's auto-restart functionality either programmatically or through a config file. I'll show how to do it using a config file, because it will be easier for people to follow.

  1. Install PM2 globally by running npm install pm2 -global
  2. Create a new file under the /src folder named process.json with the following content:
{
  "apps" : [{
    "name": "MultiOgar",
    "script": "index.js",
    "instances": 1,
    "exec_mode": "fork",
    "autorestart": true,
    "max_memory_restart": "200M",
    "env_production": {"NODE_ENV": "production"},
    "interpreter_args": "--nouse-idle-notification --max-old-space-size=4096"
  }]
}

Launch MultiOgar with pm2 start process.json --env production

That's all! PM2 will now auto-restart MultiOgar upon crashes and the same is true for when it reaches over 200MB RAM usage. By passing --env production we ensure that Node.js runs in production mode. The flag --nouse-idle-notification turns off idle garbage collection which makes the GC constantly run and --max-old-space-size increases the heap memory limit of the instance (remove flag if not enough memory).

GREEB commented 8 years ago

@acydwarp Thanks for the comment Acyd. Yeah dangerous practices is what i do 👎 .

BaumanDev commented 8 years ago

@acydwarp It's giving me minimatch problems. It says I need to upgrade to 3.2.0 higher. Also, when I start the server, is just automatically kills it. I don't even get to read the problem.

inxlio commented 8 years ago

@acydwarp only can run 1 process with pm2..

BaumanDev commented 8 years ago

Increase instances

BaumanDev commented 8 years ago

What else does PM2 do but auto restart?

acydwarp commented 8 years ago

@inxlio The way to run multiple processes is by cloning MultiOgar multiple times and adding objects to the apps array in the process.json file. You can list the running processes by executing pm2 list.

@NatsuTheGreat If you're having issues, upgrade Node.js to the latest stable version, 6.3.0. To learn more about what PM2 can do, read the official docs.

BaumanDev commented 8 years ago

I am using node v6.3.0 ;/

inxlio commented 8 years ago

@acydwarp I cant understand

BaumanDev commented 8 years ago

@GREEB There is a serious problem with agarlist. You have to shoot mass to move, it happened randomly. I lost all all 200k mass because of this :/ (instant merge)

Barbosik commented 8 years ago

there is really some memory leak, after 4 days of running, my servers crashed with out of memory :) One server consumed 800 MB before crash. Actually I killed it with stress testing at that moment :) Added some logging to look where memory leak exactly

BaumanDev commented 8 years ago

Have you found out the memory leak source?

acydwarp commented 8 years ago

Underlying Node modules can also be a source of memory leaks. The ws module for example has a bunch of memory leaks as pointed out in this PR: https://github.com/websockets/ws/pull/734. I would also take the advice from this PR and cleanup MultiOgar's self = this by using arrow functions instead.

BaumanDev commented 8 years ago

var self = this has memory leaks?

BaumanDev commented 8 years ago

Arrow function seem cool. Just why does var self = this cause problems? Does it cause any issues with uws?

acydwarp commented 8 years ago

The PR points out that the use of self = this leads to memory leaks when used in closures. With arrow functions, this remains sticky to it's original context so it eliminates the need for self = this.

GC is a complex process and even valid code can cause memory leaks. One can pass the --expose-gc flag to Node and control garbage collection manually to avoid your app from cashing, but GC is generally not something you want to trigger as it blocks the app from executing while it's doing it's job. Best thing to do is to trace down the leaks.

Barbosik commented 8 years ago

@NatsuTheGreat: no, var self = this; doesn't leads to memory leaks. But closure can

BaumanDev commented 8 years ago

I see now. But does it cause issues with uws? That project is on ws.

Barbosik commented 8 years ago

I don't know. There is some a little memory leak with ws. I just get a dump from the test server with a lot of players (bubble-wars.tk:443). Will investigate where memory leak happens exactly.

BaumanDev commented 8 years ago

@Barbosik Alright. Good luck.

@acydwarp Do you know if var self = this; causes issues with uws? I am using uws. I do see quite an amount of var self = this; I used var self = this.gameServer; but that shouldn't do anything because I didn't have closures.

acydwarp commented 8 years ago

It's unrelated to uws, uws is c++ code with a Node.js interface which allows it to run more efficient than ws. I noticed a 5% drop in CPU consumption when I switched from ws to uws, using 200 bots to benchmark.

But, CPU consumption is actually fine with MultiOgar, it's the memory allocation that's mostly an issue and causes most of the lag.

Barbosik commented 8 years ago

yes, sometimes it has random peaks in update time, up to 80-100 ms. I have no idea what is the reason for such peaks, probably it hapens when somone multisplitting. But I spectated big players and it happens not always when someone make multisplit. I think it's protocol lag, need to catch it

Barbosik commented 8 years ago

at a glance memory leak occurs inside ws code. It allocates a most large part of memory. I didn't found other memory leaks in MultiOgar dump. Needs to wait some time to get snapshot when memory leak will be noticeable.

acydwarp commented 8 years ago

The random peak in update time is most likely caused by the garbage collector. The GC stores a list of objects and uses an algorithm to decide when to traverse the list and clean up the objects. You can avoid this from happening too often by passing the flag --nouse-idle-notification which disables V8's idle garbage collector. Also you can pass the flag --max-old-space to Node and increase it's value, but you'll need a bunch of available RAM for this (by default it's set at 1.4GB for 64-bit systems). It's best to avoid GC as much as possible because it hinders performance in real-time applications.

Barbosik commented 8 years ago

@acydwarp: The random peak in update time is most likely caused by the garbage collector.

no, it may works very smooth for very long time even when there are a lot of players on the server and they using multisplit. But sometimes it works with peaks and they happens when someone make multisplit. But not always...

I think it may be related with player name. Because multisplit produce a large cell update packet, and it consists of a lot of strings with cell name... Probably it may leads to memory allocations in ws module, but I'm not sure

BaumanDev commented 8 years ago

@acydwarp I have implemented arrow functions and got of self = this I'll be watching to see how memory goes.

Barbosik commented 8 years ago

"arrow function" is a lambda expression. It's the same as function. Just a new syntax. I think there is no difference for compiller and both variants will be compiled to the same code.

BaumanDev commented 8 years ago

Yes and that new syntax got rid of self = this; and closures.

Barbosik commented 8 years ago

@NatsuTheGreat: got rid of closures

actually "arrow function" is nothing else than closure :smile: "=>" is standard syntax for closure and lambda definition, see C# for example. It's just alias for function.

There is just one advantage to use '=>' - it has more short code and helps to read the code more easy. But this syntax is not supported on legacy node.js

Barbosik commented 8 years ago

found the location of random lag peaks. It's client update, so it's still protocol bottle neck. It can take up to 60 ms, while updateMoveEngine takes max 9 ms (tested with 40 online players)

BaumanDev commented 8 years ago

How do you find out how much MS a function takes up? I want to know because I refactored the collisions and I want to see if it got faster/slower.

Barbosik commented 8 years ago

with process.hrtime and some hand-made logic which allows me to catch the lag moment and measure execution time of code which is interested for me. I just call process.hrtime at key points and store values. When the loop is completed I check if there was a lag. If any, I call method which trace stored values with time at key points and calculate which fragment of the code consumes too much time. I monitoring my server with my desktop client, which allows me to show update interval graph in realtime, so I see when there is a lag. In addition I added some debug commands which allows me to reset time values. So I can reset values on false lag detection.

Actually, collision works pretty fast. It takes max 10-16 ms on server with 50+ players. The problem with client update. Client update sometimes takes up to 60-80 ms. It leads to lag peaks.

Barbosik commented 8 years ago

I think memory leak related with disconnected players. It happens when someone using DDoS attack with minions. Disconnection takes a lot of time, so it eats memory

makandz commented 8 years ago

@Barbosik I think that could be true. I have had a server with over 10,000 disconnects and quite a lot of players disconnected. It affected server performance a lot based on a server that just started with about the same amount of players.

F0RIS commented 8 years ago

https://github.com/F0RIS/Ogar-servers-manager