diegomura / react-pdf

📄 Create PDF files using React
https://react-pdf.org
MIT License
14.28k stars 1.12k forks source link

Memory Leak #378

Closed coreyweidenhammer closed 5 years ago

coreyweidenhammer commented 5 years ago

Before I get into the issue, I just wanted to thank @diegomura and others for your work and responsiveness on this wonderful library! Your efforts are greatly appreciated.

OS:

React-pdf version: 1.0.0-alpha.24, 1.0.0-alpha.25 Node v8.9.3

Description: We are consistently observing a memory leak, with the memory footprint of our node process growing linearly with each generated PDF until eventually exhausting the resources on our servers.

My team saw https://github.com/diegomura/react-pdf/issues/165 (now closed), but we are uncertain as to whether what we are experiencing is related or not. As noted above, we have tried using both alpha.24 and alpha.25 with the same results.

We first noticed the memory leak on servers running in AWS via ElasticBeanstalk (Amazon Linux) but have since been able to recreate it on our local development environments running macOS 10.13.6. We attempted isolating the source of the leak ourselves but have not yet come up with anything conclusive. We're not even sure if the memory leak is (as before) related to a dependency vs the react-pdf code itself, but we are hoping you might be able to identify something more readily.

How to replicate issue: We created a small project with limited dependencies that should allow you to recreate the issue as we see it. Zip attached here: pdfGenerator.zip

There are instructions in server.js, repeated below:

RUN $ yarn / npm install and then run: $ npm run start-with-chrome-inspect

  1. In a Chrome browser window, go to http://localhost:3000/.
  2. In a new tab go to: chrome://inspect/#devices and click inspect under the target. In the devtools window, select the Memory tab at the top. Select 'Heap snapshot'. Click 'Take snapshot'.
  3. Open http://localhost:3000/pdf in a new tab.
  4. Go back the devtools window and click the dark circle icon at top-left above the Profiles pane to generate another snap.
  5. Repeat steps 4 and 5 repeatedly and you can see the retained memory increase. (JSArrayBufferData especially)

An alternate path to viewing the memory leak at a higher level is to monitor via pm2: $ yarn / npm install pm2 $ pm2 start server.js $ pm2 monit -- proceed to access http://localhost:3000/pdf repeatedly -- and watch the memory footprint grow.

Note that the memory footprint grows in proportion to the size of the PDF generated, so if you want to see it grow/leak more readily, you can create a larger PDF with more content, images, etc.

Thank you in advance for your help!

diegomura commented 5 years ago

Hey @coreyweidenhammer Sorry I didn't respond earlier. Thanks for reporting this and the project to replicate it.

I don't know wat may be causing this to be honest. My first guess is that react-pdf caches assets in memory to avoid network request for the same item, but it has a limit, so the memory footprint should stop increasing eventually.

I'll try to find some time to look a it, but it's going to be hard to debug.

diegomura commented 5 years ago

Related to https://github.com/foliojs/pdfkit/issues/258 ?

kations commented 5 years ago

i also experience memory leaks on client rendering but only on some devices

kations commented 5 years ago

img_0275

kations commented 5 years ago

It just happens on some client machines of different kind, Android, Mac etc. but I can't reproduce it myself.

n6g7 commented 5 years ago

tl;dr: Each render leaks PDFKit objects.


I've been running into this issue as well and put together a small reproduction gist: https://gist.github.com/n6g7/4d9d9af8eb5b943b61c89c1c6fa29a3b

When running this on my mac I get the data in this spreadsheet. Basically the the example above leaks about 250kB per render in memory.

I also used node-heapdump to inspect the objects in memory and I noticed many StandardFont, Root, PDFDocument, PDFReference, etc objects that aren't being garbage collected. I think these are pdfkit classes.

I don't really understand what's happenign with pdfkit, its last release was 2 years ago (v0.8.3) but there's been a lot of activity since then on the repo, which doesn't appear to have been released. I assume this is why react-pdf uses a fork (@react-pdf/pdfkit).

Next step is to reproduce the leak using pdfkit only, could someone point to the source code of @react-pdf/pdfkit?

diegomura commented 5 years ago

Thanks for your input and work on this! Sorry I cannot help much right now. I’m out of town until tomorrow. I’ll soon resume my work on this library. About the @react-pdf/pdfkit source code, you can find it here

Skywalker13 commented 5 years ago

@kations the problem is just yoga-layout-prebuilt (dep of @react-pdf/renderer) which is an asm.js build of yoga-layout (it's not a memory leak). If you use electron or node, you can prefer to patch @react-pdf/renderer in order to use yoga-layout, then it will build a native node module instead. The TOTAL_MEMORY=134217728 limit is only for the asm.js build.

n6g7 commented 5 years ago

@Skywalker13 I get similar results with a patched version of @react-pdf/renderer (using yoga-layout instead of yoga-layout-prebuilt). Could you elaborate on the changes you've made?

Skywalker13 commented 5 years ago

I suppose that you use react-pdf in frontend (browser side) and not in server side (with babel-node like me). In this case you can continue to use the asm.js version.

Look at the package.json of yoga-layout...

If I build for node by hand:

$ npm run build:node   

> yoga-layout@1.9.3 build:node /home/schroeterm/tmp/node_modules/yoga-layout
> npm run copy-sources && autogypi && node-gyp configure build

> yoga-layout@1.9.3 copy-sources /home/schroeterm/tmp/node_modules/yoga-layout
> ! npm -s run is-monolithic || (rsync -r --checksum --delete ../yoga/ sources/yoga/)

make : on entre dans le répertoire « /home/schroeterm/tmp/node_modules/yoga-layout/build »
  COPY Release/obj.target/nbind/geni/symbols.txt
  CXX(target) Release/obj.target/nbind/sources/yoga/Utils.o
  CXX(target) Release/obj.target/nbind/sources/yoga/YGEnums.o
  CXX(target) Release/obj.target/nbind/sources/yoga/YGNode.o
  CXX(target) Release/obj.target/nbind/sources/yoga/YGNodePrint.o
  CXX(target) Release/obj.target/nbind/sources/yoga/Yoga.o
BLABLABLA...
  SOLINK_MODULE(target) Release/obj.target/nbind.node
  COPY Release/nbind.node
make : on quitte le répertoire « /home/schroeterm/tmp/node_modules/yoga-layout/build »

I have a node module for nbind which is used by yoga-layout.

$ ls -la build/Release/
total 516
drwxr-xr-x 4 schroeterm schroeterm   4096 jan 31 07:08 .
drwxr-xr-x 3 schroeterm schroeterm   4096 jan 31 07:08 ..
drwxr-xr-x 3 schroeterm schroeterm   4096 jan 31 07:08 .deps
-rwxr-xr-x 1 schroeterm schroeterm 508360 jan 31 07:08 nbind.node
drwxr-xr-x 3 schroeterm schroeterm   4096 jan 31 07:08 obj.target

No more TOTAL_MEMORY problem because it's a native build. Of course, it can be used only on server side.

And if you want to build for the browser, you must install an emscripten environment and change the TOTAL_MEMORY to an higher value. For example:

diff --git a/javascript/final-flags.gypi b/javascript/final-flags.gypi
index dd13e827..d3e1ad26 100644
--- a/javascript/final-flags.gypi
+++ b/javascript/final-flags.gypi
@@ -19,7 +19,7 @@
             "ldflags": [
                 "--memory-init-file", "0",
                 "-s", "PRECISE_F32=1",
-                "-s", "TOTAL_MEMORY=134217728"
+                "-s", "TOTAL_MEMORY=268435456"
             ],

            "xcode_settings": {

And try to run npm run build:browser

Hopding commented 5 years ago

We’ve encountered these heap overflow errors as well. They happen reliably when we try to generate documents with a large number of pages (~70 pages), or when we render a large number of separate documents with a single page.

Hopding commented 5 years ago

@diegomura It looks like pdfkit has been updated with some fixes that might solve this memory leak problem: https://github.com/foliojs/pdfkit/issues/258#issuecomment-468976025. How much work is involved in pulling these fixes into @react-pdf/pdfkit?

Tketa commented 5 years ago

Have encountered this issue as well.

Worked around it by moving the PDF generation to an AWS lambda for now. @diegomura do you need help pulling the changes in https://github.com/foliojs/pdfkit/issues/258#issuecomment-468976025 to your own pdfkit? Quite new here but happy to help if you think it could solve the issue here

diegomura commented 5 years ago

Yes!! Help is much appreciated. Don't have much time to that yet @Tketa

n6g7 commented 5 years ago

@Tketa let me know if I can help!

maxaggedon commented 5 years ago

@Tketa @diegomura @n6g7 I think this is critical to anyone using react-pdf in production, and it needs to go forward quickly. How do you think we should proceed ?

How does that sound to you ?

n6g7 commented 5 years ago

@maxaggedon Sounds good to me, I'm quite busy this week so can't commit to opening the PR and resolving conflicts but will review and test in my projects for sure.

diegomura commented 5 years ago

Also sounds good to me. I'm also in the middle of something right now (react-pdf related also), so I'm not sure if I can commit to work on this right now. I agree with your plan and I promise I'll review code when needed

Tketa commented 5 years ago

Hey @maxaggedon, agree with the plan but as mentioned I fixed the issue with the serverless solution so not a priority for me this week, can work on it this weekend however 👍

maxaggedon commented 5 years ago

know everyone is on board ! I will try to start the work during the week.

maxaggedon commented 5 years ago

PR is open 👆!

diegomura commented 5 years ago

Thanks @maxaggedon !! You rock I'll try to find some time this week to fully test that PR

bhunjadi commented 5 years ago

I've tried to pinpoint where exactly is @n6g7 gist leaking and made a couple of changes to leaktest.js:

function go(n) {
  for (let i = 1; i <= n; i++) {
    if (i === 1 || i === 500) {
      heapdump.writeSnapshot('out/' + Date.now() + '.heapsnapshot');
    }

    const element = React.createElement(Component);

    // taken from pdf() function
    const container = createInstance({ type: 'ROOT' });
    const mountNode = PDFRenderer.createContainer(container);
    // leaks occur because of updateContainer
    PDFRenderer.updateContainer(element, mountNode, null);
    // leak even more
    // container.render();

    ...
  }
}

~~We don't even need to render() for leaks to occur, something in PDFRenderer.updateContainer is maybe leaking. In that case we have a leaked Root and Page instances in heapdump. But I don't now anything about ReactFiberReconciler so could this be expected behaviour maybe? @diegomura Any ideas?~~

[edit] Looks like after using global.gc() just before writing snapshot everything is fine.

bhunjadi commented 5 years ago

@diegomura After doing some additional research, it looks like the problem is in the line this.layout.setMeasureFunc(this.measureImage.bind(this));

This is in both Image and Text constructor. Because of that text and image instances are stored in layout and can't be freed.

Somewhere in the code must be a call to this.layout.unsetMeasureFunc().

I think you were actually on the right path in this comment

Just to confirm that this is an issue I tried leaktest.js with modified render() of Text class:

async render() {
    // rendering code
    this.layout.unsetMeasureFunc();
  }

For a page with wrap: false this produced no leaks. But for a page with wrap: true there were leaks just as before because for the cloned Text intances unsetMeasureFunc was not called.

The question is where should measureFunc be unset?

n6g7 commented 5 years ago

Amazing progress, thanks @bhunjadi!!

This looks like a very promising lead now. :)

@diegomura any ideas where would be the correct place to call unsetMeasureFunc?

diegomura commented 5 years ago

Thanks for this amazing research! Sorry I've been a bit unresponsive.

I guess a good place to call unsetMeasureFunc is either before or after rendering, both should work since at that stage every elements knows where to position. However, because how page wrapping works (recursively clone elements while they are being placed in pages), it's highly probable that unsetMeasureFunc should be called also somewhere else. But not really sure where yet without further analysis.

One of the main performance bottlenecks of this project is the page wrapping algorithm. As I said, in order to contemplate all cases, it clones and re-layout many times, which makes it very hard to debug and uses a lot of memory. I've been trying to think of a simpler solution but couldn't find one yet that supports all what we currently do. And there's no prior state of the art solution to this (that I could find).

bhunjadi commented 5 years ago

I've added PR that looks like it is working.

Thanks @n6g7 for the clone hint for wrapped pages. In the end what I used in PR is cleanup in removeChild which seems to be working.

coreyweidenhammer commented 5 years ago

Thank you to everyone who has worked on this!

My team is still wrapping up testing all of our scenarios, but preliminary results from v1.5.6 appear to confirm that the memory leak is indeed resolved. I'll post another update when we are finished.

diegomura commented 5 years ago

Thank you @coreyweidenhammer for testing! Please post any further details. Closing this issue 😄

jeffijoe commented 4 years ago

~I'm on @react-pdf/renderer@^1.6.2 and having this problem it seems. My use case is exporting chat transcripts which can have a lot of messages, and after having exported around 700 PDFs, it stops doing anything (or panics with a heap out of memory error).~

~I tried using clinic to profile, and it indeed seems memory just keeps ramping up. Am I missing something? 😄~


EDIT: Seems I'm having another problem, will open a new issue 😄

oswaldoacauan commented 3 years ago

Anyone got around of building for Yoga for node? We are running the latest v2 and we are facing everyday this issue.

@Skywalker13

Skywalker13 commented 3 years ago

Patches exists for nbind and node v12, but it's annoying because it's necessary to fork yoga-layout in order to use an other nbind. And even in this case it's annoying because it breaks with electron 8 and in my case, I need yoga with node 12 and electron 8.

I've tried yoga-wasm but it's broken with react-pdf: https://github.com/rickbutton/yoga-wasm/issues/4

And just now, I've discovered https://github.com/pinqy520/yoga-layout-wasm

I will try this one asap...

oswaldoacauan commented 3 years ago

@Skywalker13 any luck on your try outs? We have no solution for this so far other than manually restarting the process :(

Skywalker13 commented 3 years ago

I tried a bit, it's not just a drop-in replacement. It's necessary to fork and adapt a little bit react-pdf (because the init is async with the wasm variants like yoga-layout-wasm), and maybe it doesn't work (like yoga-wasm), I don't know.

I've some success with react-pdf and yoga-layout built for some node versions but it's a bit annoying (for example https://github.com/Xcraft-Inc/yoga/commit/a0039ced2823892eef0f94a5b32a1809e9f4b34f).

IMHO It's definitively react-pdf which should use something else...