devongovett / node-wkhtmltopdf

A wrapper for the wkhtmltopdf HTML to PDF converter using WebKit
610 stars 147 forks source link

Need a way to finish the spawned process if wkhtmltopdf never responds #96

Open keithrz opened 7 years ago

keithrz commented 7 years ago

Use case: We're using wkhtmltopdf (version 0.12.4 w/patched qt) with --window-status argument. If the page loaded by wkhtmltopdf never emits the proper window status, wkhtmltopdf never finishes. Logged against wkhtmltopdf as https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2490

Justification for using --window-status: We're using wkhtmltopdf to print web pages that run a lot of JavaScript & make a lot of long-running requests before the pages are ready to print.

With this wrapper (version 0.3.4), running into wkhtmltopdf issue 2490, the child processes never finish: the /bin/sh wkhtmltopdf process nor the wkhtmltopdf process (running on Linux or Mac OS X.)

Either we need a reference to the spawned child process returned from wkhtmltopdf, or we need an option included like "processTimeout", which specifies how many milliseconds to wait until the child process is killed (if it is still there). Ideally, the read stream is also closed at that point.

Thoughts? If you point me in the right direction, my coworker and/or I may be able to work on a pull request.

zxlin commented 7 years ago

Should be able to just add a setTimeout after child is spawned (around line 100) to kill the child process after some optionally desired time.

royazuniga commented 7 years ago

Hi,

Setting a timeout on the child or the stream seems to kill only the parent process when executing on platform === 'darwin'. However, /bin/sh kicks off the wkhtmltopdf process and that process does not end when the timeout occurs.

One thought was to avoid the issue entirely and spawn wkhtmltopdf without the need for /bin/sh. Unfortunately, the process doesn't end on its own yet it is still able to generate valid PDFs.

To do that, assuming that wkhtmltopdf is available I'm spawning the process with var child = spawn(wkhtmltopdf.command, [ ...args]); and outputting to stdout. The process never progresses past 90% and times out.

With debug enabled I see the wkhtmltopdf command executed wkhtmltopdf --window-status "allWidgetsRendered" --no-stop-slow-scripts "<some URL that sets the window status>" -

If I run the same command in the terminal and append > ~/Documents/sample_page.pdf, the PDF is valid and the process exits. Exit with code 1 due to network error: RemoteHostClosedError

Any information of insight you have would be great!

adamczykjac commented 7 years ago

@royazuniga I'm having the same issue:

try {
    wkhtmltopdf(html, { encoding: 'UTF-8', debugJavascript: true }, (error, stream) => {
      if (error) {
        console.log(error);
        pdfModule.reject(error);
        return error
      }
      const outputPDF = fs.createWriteStream(fileName);
      stream.pipe(outputPDF);
      outputPDF.on('finish', function() {
        pdfModule.resolve({ fileName, base64: getBase64String(outputPDF.path) });
        fs.unlink(outputPDF.path);
      }).on('error', function(err) {
        console.log(err);
        pdfModule.reject(err);
      });
    });
  } catch (exception) {
    console.log(exception);
    pdfModule.reject(exception);
  }

No error returned, no exception catched. What I'm passing as html is SSR rendered React Components with inline styling (one of them is a big one - Bootstrap). Interestingly enough, I managed to narrow the issue to exit the process (and store the file -> pass it via the browser) when some CSS styles are removed/added. So it seems that there's some limit exceeded (it sounds dumb, but - size of the stream perhaps)? and thus process cannot exit. Rendering the same html to a file first (31kB) and then putting it through wkhtmltopdf works fine.

EDIT Running on MacOS Sierra, will try to run the app (MeteorJS backed) via some Docker container and will let you know if same issue applies.

zxlin commented 7 years ago

@royazuniga Is that behavior only on OSX and not on Linux?

royazuniga commented 7 years ago

@zxlin I've seen it on both osx (local) and linux (staging & prod).

zxlin commented 7 years ago

@royazuniga The original author put in the shell spawn for a reason and I'm not comfortable just removing it. It seems the wkhtmltopdf repo is discussing the possibility of adding timeouts to wkhtmltopdf itself.

We can possibly find a way to kill the descendant process that the shell spawns and kill that directly. thoughts?

cuongthai commented 4 years ago

For our case, we don't want wkhtmltopdf keep processing on large document. it's better just kill it. We made a patch in index.js line 200. wkhtmltopdf.command = 'timeout 5s wkhtmltopdf';

yurks commented 3 years ago

timeout option here https://github.com/devongovett/node-wkhtmltopdf/pull/133