benjamn / recast

JavaScript syntax tree transformer, nondestructive pretty-printer, and automatic source map generator
MIT License
4.99k stars 348 forks source link

memory usage very bad with print #177

Open danielr-minted opened 9 years ago

danielr-minted commented 9 years ago

I have been writing a custom loader for webpack to do some preprocessing on our js. It's been incredibly slow so I've been doing some timing metrics. With sourcemaps enabled it's taken hundreds of milliseconds to call .print() vs about 10 ms to call prettyPrint(). It's brutally awful. Compiling 7 MB of javascript source results in out of memory errors with node. prettyPrint() is not an effective workaround for me right now because I have some UTF8 control characters which which are unescaped when pretty printing and cause webpack to barf.

Are there any known issues around memory usage and cpu performance of .print()?

benjamn commented 9 years ago

If we can come up with a representative test case that runs in a browser, then we can use the Chrome dev tools memory profiler to see what kinds of objects are dominating.

How drastic is the preprocessing that you're doing? Are you inserting new nodes, or mostly modifying existing nodes? Is it per-file, or per-function, or what? If you are comfortable sharing your actual code and preprocessing transform, that would make the reproduction easy, but of course I understand if that's not possible.

Can you confirm that this problem only occurs (or is much worse) with source maps enabled? If so, then we may be creating too many Mapping objects in https://github.com/benjamn/recast/blob/master/lib/lines.js.

My suspicion is that CPU performance is suffering mostly because of garbage collection. I haven't done much memory profiling of recast.print, but I have done quite a bit of CPU profiling and optimization, so I'm hopeful the performance problems will go away when the memory usage problems are fixed.

danielr-minted commented 9 years ago

I'm trying to convert about 350 javascript files from a plain script tags and concatenation to being node importable, and using webpack. The trick is that I have to make all these files work with or without the header and across our old test suite and new test suite. To do this I have a header that looks like this:

var module; UMD({
  "minted:some/path/to/file": ["Sym1", "Sym2"],
  "templates:some/template/path": []
})(this, function(module1) {
  //module implementation here

 return {
  SomeSymbol: SomeSymbol
 };
});

The huge constraint is that I have to try to extract benefit from a module system, without tackleing the entire modularization for the whole site (which would be 800 files with lots of interesting circular dependences and strange load orders). So by putting my header in a function, I can provide different implementations in different environments and successfully create an incremental transition to a module system. This is a perfect fit for recast, fwiw.

The problem is that webpack is not clever enough to evaluate my module to figure out the packing, so I need to write a decompiler from my header format to a plain CJS system. I change the above header to...

var __imports__ = [require('some/path/to/file'), require('/some/absolute/template/path')];
module.exports = function (module1) {
 // module goes here...
 return {
  SomeSymbol: SomeSymbol
 };
}.apply(window, __imports__);

I also have done some timing profiling. One, pretty printing is about as fast as regular printing without sourcemaps. Printing with sourcemaps is way, way, slower. Sometimes as long as a second. Third, memory usage really skyrockets as webpack goes along (I'm not conviniced all the memory is being given up properly). I've tried disabling the stringCache and that gets me further along, but it's still slow and still crashes.