dichovsky / pdf-to-png-converter

Library Convert PDF to PNG
MIT License
121 stars 26 forks source link

For some PDFs an error can cause the process to run out of memory #33

Open Zirafnik opened 1 year ago

Zirafnik commented 1 year ago

A PDF with many pages, causes the Node to run out of memory.

This is a similar issue to the one in 'pdf2pic' library: https://github.com/yakovmeister/pdf2image/issues/54

It can be solved with manual batching, with arrays of page numbers, but it is not a sexy solution, as you first need to determine the number of pages in the document with an external library (such as 'pdfjs-dist') and then create the arrays, allowing for non-even number of pages.

Zirafnik commented 1 year ago

Update: The batching doesn't work.

Event with setting the output folder, the library still saves buffers to memory, which eventually overruns it.

Update: Apparently I was testing with a pdf that had some kind of internal error, but was visually otherwise fine. The page 55 out of 68 was broken, which then broke one of the dependencies. I tested it with only pages 54-58, to remove the too many pages assumption, and it still broke. As soon as I fixed the pdf with an external tool, it worked.

My assumption was a broken/missing font, due to the warnings below (outputted at verbosityLevel: 1), which led me to finding ways of fixing my pdf. The warning is connected to Mozillas pdf.js: https://github.com/mozilla/pdf.js/issues/3768#issuecomment-36468349

However, since the warning was firing for ALL the pages, and none up until page 55 broke, I believe the problem lies elsewhere, since the problematic dependency seems to be canvas.node.

I am not sure how to fix this or what specifically causes it, however, perhaps some kind of error handling could be added?

Warnings:

...
Warning: TT: undefined function: 32
Warning: TT: undefined function: 32
Warning: TT: undefined function: 32
...

Errors:

<--- Last few GCs --->

[20233:0x5bbd220]    71971 ms: Mark-sweep (reduce) 131.3 (170.1) -> 131.2 (136.8) MB, 140.6 / 0.0 ms  (average mu = 0.681, current mu = 0.000) external memory pressure; GC in old space requested
[20233:0x5bbd220]    72130 ms: Mark-sweep (reduce) 131.2 (136.8) -> 131.2 (136.6) MB, 159.3 / 0.0 ms  (average mu = 0.494, current mu = 0.000) external memory pressure; GC in old space requested

<--- JS stacktrace --->

FATAL ERROR: v8::ArrayBuffer::New Allocation failed - process out of memory
 1: 0xb7a940 node::Abort() [node]
 2: 0xa8e823  [node]
 3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd5cdeb  [node]
 6: 0xd6d99d  [node]
 7: 0x7f4cd86d8b3c Context2d::GetImageData(Nan::FunctionCallbackInfo<v8::Value> const&) [/home/user/folder1/folder2/node_modules/canvas/build/Release/canvas.node]
 8: 0x7f4cd86cb3d3  [/home/user/folder1/folder2/node_modules/canvas/build/Release/canvas.node]
 9: 0xdbaa30  [node]
10: 0xdbbf6f v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x16fb7b9  [node]
Aborted
AChangXD commented 1 year ago

@Zirafnik Same issue here

dichovsky commented 2 months ago

Please try v3.3.0