AndreasMadsen / trace

Creates super long stack traces
https://trace.js.org
MIT License
194 stars 18 forks source link

Doesn't really work #9

Closed iliakan closed 10 years ago

iliakan commented 10 years ago

In a real app that just fails. Maybe V8 has changed. My code is big, I didn't localize the issue, but.. Please try it in a real app.

P.S. I use Koa.JS

AndreasMadsen commented 10 years ago

I have tried trace in a real app (*), both the 0.x and 1.x version. You should ensure that you are using node 0.11.13 (not doing so is the most common cause for issues). The cause can also be one of the native modules that you might be using, some of them don't use MakeCallback (which they should) and not doing so makes it completely impossible to track.

I also remember there where some issues with redis because they use try/catches in a really nasty way. But I think I manage to resolve that.

In any case if you want me to help, you should provide some more information. Even if it is a huge codebase it is better than nothing. Also could you post node -e "console.log(process.versions)" to check that we have matching versions.

(*) A big application, collecting stacktraces for one week. Though I can't recommend doing that in production.

AndreasMadsen commented 10 years ago

A print of the stacktrace with and without trace would also help. Even if they are not helpful to you. Eq. they only show node internals.

iliakan commented 10 years ago

Ah, so it's alive. I'll try a simpler app (yes, 0.11.13).

iliakan commented 10 years ago

By the way, the error was:

TypeError: Cannot read property 'slice' of undefined
    at AsyncListenerInst.asyncFunctionInitialized [as create] (/js/javascript-nodejs/node_modules/trace/trace.js:29:27)
    at runAsyncQueue (tracing.js:186:25)
    at process.nextTick (node.js:401:9)
    at onwrite (_stream_writable.js:312:15)
    at WritableState.onwrite (_stream_writable.js:103:5)
    at Socket._writeGeneric (net.js:669:5)
    at Socket._write (net.js:679:8)
    at doWrite (_stream_writable.js:265:12)
    at writeOrBuffer (_stream_writable.js:252:5)
    at Socket.Writable.write (_stream_writable.js:197:11)
AndreasMadsen commented 10 years ago

You wouldn't by any change be manipulating the stack trace in any other way. e.q. stack-chain, clarify, longjohn, or setting .stack manually somewhere.

I can reproduce the error but only if I call new Error().stack within the trace module itself, specifically Error.prepareStackTrace. This is caused by a recursion safeguard in v8, there then prevents a call to Error.prepareStackTrace while being inside Error.prepareStackTrace. I think I can safeguard around the safeguard, but I need a clear head for that and it is also highly unlikely that it is the cause of the issue, unless the stack trace is manipulated in other ways simultaneously.

PS: if you could reinstall trace for me and check again, that would be awesome. Specifically make sure that trace depends on the latest version of stack-chain (1.0.0). Try and print:

require('trace');
require('fs').writeSync(1, global._stackChain.version);

from within your application.

iliakan commented 10 years ago

The version is 1.0.0. Right now have no idea what causes the module to fail.

I may add Object.observe somewhere to trace things or put other debug into the module file.

P.S. When I start my app with this module, it dies. Normally it works.

Right now:

[01:51:04] 'client:compile-css-once' errored after 326 ms
/js/javascript-nodejs/node_modules/trace/trace.js:29
  var trace = err.callSite.slice(2);
                          ^
TypeError: Cannot read property 'slice' of undefined
  at AsyncListenerInst.asyncFunctionInitialized [as create] (/js/javascript-nodejs/node_modules/trace/trace.js:29:27)
  at runAsyncQueue (tracing.js:167:25)
  at process.nextTick (node.js:401:9)
  at onwrite (_stream_writable.js:312:15)
  at WritableState.onwrite (_stream_writable.js:103:5)
  at Socket._writeGeneric (net.js:669:5)
  at Socket._write (net.js:679:8)
  at doWrite (_stream_writable.js:265:12)
  at writeOrBuffer (_stream_writable.js:252:5)
  at Socket.Writable.write (_stream_writable.js:197:11)
  at Socket.write (net.js:606:40)
  at Console.log (console.js:53:16)
  at Object.module.exports [as log] (/usr/local/lib/node_modules/gulp/node_modules/gulp-util/lib/log.js:8:15)
  at Gulp.<anonymous> (/usr/local/lib/node_modules/gulp/bin/gulp.js:189:11)
  at Gulp.EventEmitter.emit (events.js:107:17)
  at Gulp.Orchestrator._emitTaskDone (/js/javascript-nodejs/node_modules/gulp/node_modules/orchestrator/index.js:264:8)
  at /js/javascript-nodejs/node_modules/gulp/node_modules/orchestrator/index.js:275:23
  at finish (/js/javascript-nodejs/node_modules/gulp/node_modules/orchestrator/lib/runTask.js:21:8)
  at module.exports (/js/javascript-nodejs/node_modules/gulp/node_modules/orchestrator/lib/runTask.js:36:10)
  at Gulp.Orchestrator._runTask (/js/javascript-nodejs/node_modules/gulp/node_modules/orchestrator/index.js:273:3)
  at Gulp.Orchestrator._runStep (/js/javascript-nodejs/node_modules/gulp/node_modules/orchestrator/index.js:214:10)
  at /js/javascript-nodejs/node_modules/gulp/node_modules/orchestrator/index.js:279:18
  at finish (/js/javascript-nodejs/node_modules/gulp/node_modules/orchestrator/lib/runTask.js:21:8)
  at /js/javascript-nodejs/node_modules/gulp/node_modules/orchestrator/lib/runTask.js:52:4
  at f (/js/javascript-nodejs/node_modules/gulp/node_modules/orchestrator/node_modules/end-of-stream/node_modules/once/once.js:16:25)
  at DestroyableTransform.onend (/js/javascript-nodejs/node_modules/gulp/node_modules/orchestrator/node_modules/end-of-stream/index.js:31:18)
  at DestroyableTransform.EventEmitter.emit (events.js:129:20)
  at /js/javascript-nodejs/node_modules/through2/node_modules/readable-stream/lib/_stream_readable.js:965:16
  at process._tickCallback (node.js:343:11)
AndreasMadsen commented 10 years ago

From the new stacktrace it appears that there was a call to console.log from /usr/local/lib/node_modules/gulp/node_modules/gulp-util/lib/log.js:8:15. Fitting this intro the only theory I currently has, this log function have to be called from a Error.prepareStackTrace callback. Could you do a search for Error.prepareStackTrace calls. Since you have gulp globally installed this is a bit complicated, but try:

cd /usr/local/lib/node_modules/ && ack "Error.prepareStackTrace"
cd /js/javascript-nodejs/ && ack "Error.prepareStackTrace" 

Could you tell me what changed since last time, the stack trace is much longer and extends beyond one tick/turn which mean that trace worked better this time. Also check that Error.stackTraceLimit is set to Infinity; Just run console.log(Error.stackTraceLimit). It appears that it is only 30.

I may add Object.observe somewhere to trace things or put other debug into the module file.

I would really appreciate if you could do some debugging yourself, I can only guess. So here is a protip: the extra console.log you will be using, don't do them with console.log(msg) but with fs.writeSync(1, msg). This is because calling console.log inside trace will cause a recursive bug. And it will appear just like this error.

P.S. When I start my app with this module, it dies. Normally it works.

Yeah, a normal result of using trace is that the buggy application suddenly works. This is because trace spend some time in each async action thus causes some race condition to be fixed. It may be the opposite in your case (trace causes a race condition), but I doubt it. It is more likely a bug with trace or one of your modules there does something really nasty.

iliakan commented 10 years ago

Hi, I installed ack (didn't use it before):

/js/javascript-nodejs
> cd /usr/local/lib/node_modules/ && ack "Error.prepareStackTrace"

/usr/local/lib/node_modules
> cd /js/javascript-nodejs/ && ack "Error.prepareStackTrace"

(No results at all).

P.S. Didn't get the last statement. To clarify, the app works without trace and dies with trace. Inside gulp :/ not sure why yet.

iliakan commented 10 years ago

As of now, I can't find an offending statement. There are many calls of Error.prepareStackTrace in various modules. Guess, most of them will be around for some time, they are 3rd party, can't fix them.

But in the stack and examining the files, I can't find console.log inside Error.prepareStackTrace. That would be really strange, no one wants to log things inside this function and, naturally, I should see such messages, but I don't.

AndreasMadsen commented 10 years ago

Could you tell me with modules, are any of them used in your application. Also it doesn't have to be console.log it can be any call there involves any async operation. Also you may not see output because the error happens before the output is send.

iliakan commented 10 years ago

I used Object.observe to trace prepareStackTrace changes:

if (process.env.NODE_ENV == 'development' && process.env.DEV_TRACE) {
  require('trace'); // active long stack trace
  require('clarify'); // Exclude node internal calls from the stack
}

Object.observe(Error, function(changes) {
  changes.forEach(function(change) {
    fs.write(1, change.type + ' ' +  change.name);
  });

});

In short: Error object is not modified after require('trace').

Does that help?

AndreasMadsen commented 10 years ago

Try using fs.writeSync not fs.write. fs.write is an async operation.

iliakan commented 10 years ago

I managed to reproduce the fail in a simpler app. Please check out https://github.com/iliakan/trace-fail

It dies with require('trace'); and works (albeit doesn't do anything) without it.

AndreasMadsen commented 10 years ago

Fantastic, thank you so much, i will debug it later.

iliakan commented 10 years ago

Thank you for a promising module.

AndreasMadsen commented 10 years ago

Okay just cloned trace-fail and executed gulpfile.js there where no errors. By the way I had to rename the name in package.json to trace-fail and install trace in order to not get an obvious error.

iliakan commented 10 years ago

You mean, it works fine with trace?

iliakan commented 10 years ago

Here's what I get: http://ilyakantor.ru/screen/2014-09-05_0136.swf

node 0.11.13 npm 1.4.24 MacOS.

AndreasMadsen commented 10 years ago

I see it now. I executed node gulpfile.js, but should have executed gulp.

AndreasMadsen commented 10 years ago

After 2 hours I have isolated the bug down to:


require('trace');
var fs = require('fs');

var gp = {
  get stylus() {
      fs.writeSync(1, 'Loading stylus\n');
      require('coffee-script/register');
      return function () {};
  }
};

// Preloading of gp.stylus causes the program to work, some lazy logic
// causes the bug
fs.writeSync(1, 'About to fetch\n');
console.log(gp.stylus);

I still need to do some more debugging, but at least I'm getting close. PS: the bug only occur if stylus is lazy loaded.

AndreasMadsen commented 10 years ago

Got it. Apparently the coffee-script overwrites Error.prepareStackTrace. This is normally fine, but when coffee-script gets lazy loaded because gulp-stylus is lazy loaded, the Error.prepareStackTrace from coffee-script overwrites the one from trace. This also explains why your app worked without trace.

The fix is simple just be more nasty than the coffee-script module. This unfortunately means that the stack traces will be in .js files and not .coffee files. But I think that is okay, if coffee-script really need sourcemapped stack traces then they should interface with stack-chain.

I hope to publish the fix tomorrow, I already have a working prototype.

iliakan commented 10 years ago

Interesting. I wonder if it is possible to set Error.prepareStackTrace to the required value just before it's needed, and reset afterwards, something like:

var backup = Error.prepareStackTrace;
Error.prepareStackTrace = my_func;
... do the job ...
Error.prepareStackTrace = backup
AndreasMadsen commented 10 years ago

Sure, but you won't get long stacktraces then. But I think there is a way to get both long stacktraces and make coffee people happy. But it involes hooking in to the Error.prepareStackTrace [[setter]].

AndreasMadsen commented 10 years ago

The issue should be fixed now. Both my own tests and https://github.com/iliakan/trace-fail passed. The stack traces should be both long and source mapped (when using coffeescript).

iliakan commented 10 years ago

Uh, I tried it in the real app, now it hangs.


/js/javascript-nodejs
iliakan> ./dev
Another copy of stack-chain was found, using the first initialized copy
[21:03:24] Using gulpfile /js/javascript-nodejs/gulpfile.js
[21:03:24] Starting 'nodemon'...
[21:03:24] Starting 'client:livereload'... (hung)

P.S. That's with an infinite stack trace. With Error.stackTraceLimit = 200 it says RangeError: Maximum call stack size exceeded.

AndreasMadsen commented 10 years ago

If you use clarify, you should update that too.

iliakan commented 10 years ago

I don't use clarify. If I do, the same problem persists. I reinstalled all modules to the latest version.

iliakan commented 10 years ago

Hmm:

[21:19:55] Using gulpfile /js/javascript-nodejs/gulpfile.js
[21:19:55] Starting 'nodemon'...
[21:19:55] Starting 'client:livereload'...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
./gulp: line 2: 59666 Abort trap: 6           NODE_ENV=development node --harmony `which gulp` $*
iliakan commented 10 years ago

That is without clarify.

iliakan commented 10 years ago

It eats 100% cpu and eats memory for some time (about 4G) then dies.

AndreasMadsen commented 10 years ago

Hmm, that is odd. Do https://github.com/iliakan/trace-fail work for you. When I released the bugfix in stack-chain I also got a bug report about some recursion failure. Maybe they are related. Reference: stack-chain@0580629

iliakan commented 10 years ago

trace-fail doesn't die any more.

AndreasMadsen commented 10 years ago

hmm, got a thought

If you run

var old = Error.prepareStackTrace // backup
/* Overwrite Error.prepareStackTrace and do some stuff */
Error.prepareStackTrace = old;

Then that might create a recursive bug, the next time Error().stack is called.

AndreasMadsen commented 10 years ago

Could you try replacing the stack-chain version in trace with this one: https://github.com/AndreasMadsen/stack-chain/commit/c4612321a0be06963a1745b669c14a4a2223d641

AndreasMadsen commented 10 years ago

Assuming that the above was the cause of the error. This issue should now be solved.

AndreasMadsen commented 10 years ago

If you could try version 1.0.5 that would be awesome.

iliakan commented 10 years ago

sure, one moment.

iliakan commented 10 years ago

The project build hangs with the latest trace. The project itself is open source, you could install it and try btw. Thank you for the development.

iliakan commented 10 years ago

Sorry it does not hang. I left it running for a while, it eats 100% cpu and works (logs the progress), just terribly slowly, like 20-50 times slower than usual.

Meanwhile I can't say it eats much memory.

AndreasMadsen commented 10 years ago

like 20-50 times slower

That sounds about right. Unfortunately there isn't much to do about that.

iliakan commented 10 years ago

Ouch. The site works so slowly. Is there anything to be done? Maybe to report for node.js guys?

iliakan commented 10 years ago

Maybe an optimization to collect not 100% of every stack trace, but say 5 lines?

iliakan commented 10 years ago

Maybe changes in V8 can make optimizations possible? Long stack traces is by far one of the most required thing from node.js beginnings. There should be attention from all parties.

AndreasMadsen commented 10 years ago

You can try with Error.stackTraceLimit = 5 to simulate the performance of collecting just 5 lines pr tick/turn. The .stack will only give you the last 5 call sites (lines), but I would be willing to change that if the performance is alot better. But I highly doubt it.

Getting long stack traces is never going to be something you will do in production, it is only a super helpful tool once you have a bug that you can reproduce in development.

If you take a look in v8/src/messages.js you will see that all the .stack logic is actually lazy. (But it is still expensive to call new Error()). It may be to optimize the C++ code in v8, but it will always require copying the stack in memory and binding that to the error object. And that in itself is expensive.

It should be noted that the .stack logic in this case, is not completely lazy because of how stack-chain works. Just opened an issue for that: https://github.com/AndreasMadsen/stack-chain/issues/2

iliakan commented 10 years ago

Limiting the stacktrace actually helps. Error.stackTraceLimit = 100; instead of Infinity gives a speedup 10 times. So the slowdown is tolerable, it kind of works.

But the total length of the stacktrace is limited by 100 now. So in the case of something deep it's not enough.

The reason why it's so long is a nested nature of async calls. Long stack traces differ from traditional stack traces. They keep the whole execution tree, not only path to the current leaf.

E.g in the case of middleware, we have A calls next() which runs B that calls next(), running C etc. The stack in C will have all: A, B, C.

In the case of the usual sequential code: A(); B(); C(); -- the stack trace will have only C, because A,B actually finished.

What we can do about it?

As far as I remember, long stack trace modules (longjohn etc) wrap every async call and remember the stack. I guess your module does the same, using the new node API, right?

Maybe it is possible to limit the stack (cut off node_modules? cut off node internal modules?) which is remembered on the "capture current stack" stage?

AndreasMadsen commented 10 years ago

Just published version trace 1.0.6, this should be a factor of 8 faster (according to my synthetic benchmark). Could you try it, with and without Error.stackTraceLimit = 100; my guess is that difference is much smaller.

As far as I remember, long stack trace modules (longjohn etc) wrap every async call and remember the stack.

Well longjohn only wraps setTimeout, setInterval, setImmediate, process.nextTick and EventEmitter. So the stacks in longjohn are usually quite short and thus doesn't compare.

I guess your module does the same, using the new node API, right?

Right, its awesome. I'm hoping it makes it to 0.12.0 but it is still in limbo.

In the case of the usual sequential code: A(); B(); C(); -- the stack trace will have only C, because A,B actually finished. What we can do about it?

If I understand correctly: you would like to exclude stack lines from A() or B(), just like if you executed A(); B(); C(); directly.

The question is similar to https://github.com/AndreasMadsen/trace/issues/6 , and the solution might be the same. If we know the error happens in C() we really don't need to keep track of what happend A() or B().

If we somehow can wrap C() and detect internally in trace when C() is called and only after that start tracking the async calls, then we can skip A() and B() and the stack trace would be much faster. I think it is totally possible, I just need to wrap my head around it. Keeping a sync state across and unknown setup of async events is not as easy as it sounds.

iliakan commented 10 years ago

The new version seems to work. I will leave it running in dev env now and see how it goes.

iliakan commented 10 years ago

With the latest trace on, the Node.JS process runs for some time, eats more and more memory, then dies with Out-Of-Memory message.