AndreasMadsen / trace

Creates super long stack traces
https://trace.js.org
MIT License
194 stars 18 forks source link

Out of memory #10

Closed iliakan closed 9 years ago

iliakan commented 10 years ago

A followup to #9, so that we won't forget it.

With the latest trace on, the Node.JS process runs for some time, eats more and more memory, then dies with Out-Of-Memory message.

AndreasMadsen commented 10 years ago

Thanks, I will think about this.

AndreasMadsen commented 10 years ago

Could you try the better-memory branch. It should use less memory. Though I would still expect the memory to increase. I also can't wrap my head around how the v8 gc would treat old callSite objects.

Also if you could compare with and without the above patch that would be awesome. A way to log memory is seen below:

    var csvPath = path.resolve(__dirname, 'worker-benchmark.csv');
    var out = fs.createWriteStream(csvPath);

    setInterval(function() {
        var time = Date.now();
        var memo = process.memoryUsage();
        out.write(
            time + ',' +
            memo.rss + ', ' +
            memo.heapTotal + ', ' +
            memo.heapUsed + '\n'
        );
    }, 200);
AndreasMadsen commented 10 years ago

Forgot to say you should set Error.stackTraceLimit to something like 100, for the patch to make any difference. Just not Infinity.

iliakan commented 10 years ago

I've run the app and intentionally did nothing with it. It was just hanging around with Error.stackTraceLimit = 1000.

Eventually it ran out of memory:

...
[08:28:30] Live reload server listening on: 35729
[gulp] [nodemon] v1.2.1
[gulp] [nodemon] to restart at any time, enter `rs`
[gulp] [nodemon] watching: /js/javascript-nodejs/hmvc/**/* /js/javascript-nodejs/modules/**/*
[gulp] [nodemon] starting `node --debug --harmony ./bin/server`
Debugger listening on port 5858
js-bson: Failed to load c++ bson extension, using pure JS version
[08:28:31] livereload: deferred listen on change public/{i,img,js,styles}/**/*.*
04:28:32.092Z  INFO app: App listen 0.0.0.0:3000
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
./gulp: line 2: 60648 Abort trap: 6           NODE_ENV=development node --harmony `which gulp` $*

The setInterval output is here: http://ilyakantor.ru/worker-benchmark.csv

iliakan commented 10 years ago

P.S. I took better-memory branch and installed it manually (removing the old trace before).

AndreasMadsen commented 10 years ago

The data suggests that there is no memory leak. Is it possible that you are forking such that another process also contains require('trace') and it is that process there runs out of memory.

rplot03

iliakan commented 10 years ago

Maybe the leak is inside node.js process. I don't use forking in dev environment. It was just a running server.

iliakan commented 10 years ago

Do you mind to install the project and try?

AndreasMadsen commented 10 years ago

Could you give me a link?

AndreasMadsen commented 10 years ago

Maybe the leak is inside node.js process.

What do you mean? The blue bar should contain the total amount of allocated memory.

iliakan commented 10 years ago

I will repeat the test.

iliakan commented 10 years ago

I left the server running for an hour or two while I was away. It didn't serve anything.

Now I'm back and I see that it consumes 100% cpu. Shouldn't be like that.

The htop screenshot with 100%: http://ilyakantor.ru/screen/2014-09-13_2118.png

Screenshot from activity monitor: http://ilyakantor.ru/screen/2014-09-13_2117.png

It's in Russian, but I'll translate - Physical memory: 1.3G, Own Memory 1.29 G, Virtual memory: 4.36Gb, Shared memory 11.8Mb.

I tried to stop it with SIGINT, but failed. Only kill -9 worked. As it was hanging, it wasn't writing setInterval log at the time I checked it. Not sure where it actually stopped to write it.

I supply what I have: http://ilyakantor.ru/worker-benchmark.csv

AndreasMadsen commented 10 years ago

I will think more about it then. Could you provide a link to the project so I can debug it for myself.

PS: here is the memory output, looks like it outputted well intro two hours.

rplot04

iliakan commented 10 years ago

I was absent about 2 hours, so the log is quite up to date. It looks strange then that the logged memory consumption does not match the OS supplied data.

The project is https://github.com/iliakan/javascript-nodejs, just use node 0.11, npm install and then ./db and then ./dev.

P.S. The code which enables (or disables) trace is in modules/config/index.js (near the top). P.P.S. You also need mongodb running.

AndreasMadsen commented 9 years ago

As I suspected you do have two processes running with trace and logging to worker-benchmark.csv. Running two processes with trace is no problem, but logging to the same file from two different processes is an issue.

I just started the server, so I can't even say if I can reproduce it. But if you could reduce the setup so it only runs one process that would make it much easier for me to debug.

AndreasMadsen commented 9 years ago

I have also gotten some clarity on the gc issues, and the master branch is definitely not optimal. But the better-memory branch should be able gc the callSites there can't ever be used. However the branching possibilities in node are almost endless, so trace will take up quite a bit of memory.

iliakan commented 9 years ago

You mean I have two (to?) processes running at the same time?

AndreasMadsen commented 9 years ago

Yep

iliakan commented 9 years ago

Why sudo ps auxww shows only one server process? Does it hide something from me?

AndreasMadsen commented 9 years ago

Not good with ps. Check the monitor app, there is a gulp and a node process both have an open socket to the .csv file. You can also ADD the PID to the csv filename, then you can see two files are created.

iliakan commented 9 years ago

I can run the app without gulp then. Is there a production ready branch to use now?

AndreasMadsen commented 9 years ago

I can't recomend running with trace in production, but for now you should run with the better-memory branch.

cusspvz commented 9 years ago

Is there any news about trace stability using trace on production environments?

cusspvz commented 9 years ago

Trace seems to be a perfect tool on development/production servers to catch up cases that we don't get during development.

AndreasMadsen commented 9 years ago

I can't reproduce this issue, nor can I find any evidence that this could be an issue, as documented by the graphs. If you have this issue, it would be totally amazing if you could help me debug this or if it is open source just send a link to the codebase.

I will repeat and say that I can't recommend running this in production, nor do I intend to support it. It is simply not feasible. However isolated production like environments I definitely do support, though as this issues suggests there may be some issues.

AndreasMadsen commented 9 years ago

My hypothesis for this issue, is that it is not really trace. But the fact that trace makes everything run slower may introduce some race condition there causes a memory leak. These kind of bugs are very difficult for me to debug, as it likely depends on the operating system, specific versions and states of the database and deep dependencies.

AndreasMadsen commented 9 years ago

I see I never finished the better-memory branch. So in the npm published version there is definitely a memory leak (though it don't fix this issue). I will work on getting that done tomorrow.

iliakan commented 9 years ago

I did not repeat the test, because it was kind-of a "what if" trial. I don't use trace in production, just for development, and in development I don't have this issue anyway (things end too fast).

AndreasMadsen commented 9 years ago

The better-memory branch is now completed, merged and published. I will consider this bug fixed, at least until some more evidence turns up.