albanm / node-libxslt

Node.js bindings for libxslt compatible with libxmljs
50 stars 67 forks source link

Memory leak #45

Open tornillo opened 8 years ago

tornillo commented 8 years ago

Hi!

Memory leaks began to appear after this commit https://github.com/albanm/node-libxslt/commit/fcce1298b8c84c3fc26727931a15fc75785840b7

Here are some examples of measurements:

(before) vs (after) in megabytes

counter = 1e2
43 vs 43

counter = 1e3
220 vs 420

counter = 3e3
240 vs 1261

Test code:

'use strict';

var libxslt = require('libxslt');

var xml = '<?xml version="1.0" encoding="utf-8"?>\n<root>';
    for (let i = 0; i < 1e3; i++) {
        xml += `<item>${i}</item>`;
    }
    xml += '</root>';

var xsl = '<?xml version="1.0" encoding="utf-8"?>\n' +
    '<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">' +
    '<xsl:output version="1.0" indent="no" omit-xml-declaration="yes" />' +
    '<xsl:template match="/"><xsl:apply-templates select="root/item" /></xsl:template>' +
    '</xsl:stylesheet>'
;

var xslObj = libxslt.parse(xsl);

var rss = process.memoryUsage().rss;

var counter = 1e2;

for (let i = 0; i < counter; i++) {
    xslObj.apply(xml, function () {});
}

console.log(Math.round((process.memoryUsage().rss - rss) / Math.pow(2, 20)));

Unfortunately, segfault occurs without SaveToPersistent worker arguments (details here https://github.com/albanm/node-libxslt/issues/28). But when arguments are saved, GC can't collect them, and memory usage is growing permanently with every call of transform.

gagern commented 8 years ago

Theoretically, the worker should destructed once it has concluded its work (one way or another). At that point, the persistent handler should be destructed as well, allowing the object it pointed to, and everything therein, to be garbage collected. Obviously, some of this doesn't work as intended. It might a good idea to add debugging output to the destructor of the worker, see whether that actually gets destroyed, and if not, how to clean it up properly. Does someone have time for this, since I'm pretty busy just now?

gagern commented 8 years ago

I had a closer look at the example. It's broken in several ways. In particular, it runs the statistics output immediately after all asynchroneous processes have been queued, before they even had a realistic chance to start. It also doesn't trigger a garbage collection, so much of the memory which was claimed to leak might in facr just be uncollected garbage. A better second half would be something like this:

var counter = 1e2;
var waiting = counter;
for (let i = 0; i < counter; i++) {
  xslObj.apply(xml, function () { if (--waiting === 0) done(); });
}

function done() {
  gc(); // requires -expose-gc argument
  console.log(Math.round((process.memoryUsage().rss - rss) / Math.pow(2, 20)));
}

That said, I still see higher memory consumption for larger loop counts, and valgrind reports some lost blocks as well. Mostly single blocks, but one where the number of lost blocks is similar to the iteration count:

284,102 bytes in 98 blocks are definitely lost in loss record 357 of 358
   at malloc (vg_replace_malloc.c:299)
   by libxmljs::memMalloc(unsigned long) (in node_modules/libxmljs-mt/build/Release/xmljs.node)
   by xmlStrndup (in node_modules/libxmljs-mt/build/Release/xmljs.node)
   by xsltSaveResultToString (in build/Release/node-libxslt.node)
   by ApplyWorker::HandleOKCallback() (in build/Release/node-libxslt.node)
   by Nan::AsyncExecuteComplete(uv_work_s*) (in build/Release/node-libxslt.node)

Yes, makes sense: we are constructing V8 string objects but not freeing the XML strings. I just write #53 for that.

According to valgrind, there is also a pretty big chunk of memory lying around due to some XML document which didn't get collected. Looks like it might be from the var docResult = new libxmljs.Document() in our JS code. And if there is only one, that means that all the ones before did get collected cleanly. Plus the reported size doesn't scale with loop count, but remains at approximately 400k. Far less than the several megabytes reported by this test. My guess would be that the memory reported there just isn't garbage-collected yet. I'm not sure I trust the exposed GC to show full results immediately.