Open Zirafnik opened 1 year ago
Update: The batching doesn't work.
Event with setting the output folder, the library still saves buffers to memory, which eventually overruns it.
Update: Apparently I was testing with a pdf that had some kind of internal error, but was visually otherwise fine. The page 55 out of 68 was broken, which then broke one of the dependencies. I tested it with only pages 54-58, to remove the too many pages assumption, and it still broke. As soon as I fixed the pdf with an external tool, it worked.
My assumption was a broken/missing font, due to the warnings below (outputted at verbosityLevel: 1
), which led me to finding ways of fixing my pdf.
The warning is connected to Mozillas pdf.js
: https://github.com/mozilla/pdf.js/issues/3768#issuecomment-36468349
However, since the warning was firing for ALL the pages, and none up until page 55 broke, I believe the problem lies elsewhere, since the problematic dependency seems to be canvas.node
.
I am not sure how to fix this or what specifically causes it, however, perhaps some kind of error handling could be added?
Warnings:
...
Warning: TT: undefined function: 32
Warning: TT: undefined function: 32
Warning: TT: undefined function: 32
...
Errors:
<--- Last few GCs --->
[20233:0x5bbd220] 71971 ms: Mark-sweep (reduce) 131.3 (170.1) -> 131.2 (136.8) MB, 140.6 / 0.0 ms (average mu = 0.681, current mu = 0.000) external memory pressure; GC in old space requested
[20233:0x5bbd220] 72130 ms: Mark-sweep (reduce) 131.2 (136.8) -> 131.2 (136.6) MB, 159.3 / 0.0 ms (average mu = 0.494, current mu = 0.000) external memory pressure; GC in old space requested
<--- JS stacktrace --->
FATAL ERROR: v8::ArrayBuffer::New Allocation failed - process out of memory
1: 0xb7a940 node::Abort() [node]
2: 0xa8e823 [node]
3: 0xd5c940 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
4: 0xd5cce7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
5: 0xd5cdeb [node]
6: 0xd6d99d [node]
7: 0x7f4cd86d8b3c Context2d::GetImageData(Nan::FunctionCallbackInfo<v8::Value> const&) [/home/user/folder1/folder2/node_modules/canvas/build/Release/canvas.node]
8: 0x7f4cd86cb3d3 [/home/user/folder1/folder2/node_modules/canvas/build/Release/canvas.node]
9: 0xdbaa30 [node]
10: 0xdbbf6f v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x16fb7b9 [node]
Aborted
@Zirafnik Same issue here
Please try v3.3.0
A PDF with many pages, causes the Node to run out of memory.
This is a similar issue to the one in 'pdf2pic' library: https://github.com/yakovmeister/pdf2image/issues/54It can be solved with manual batching, with arrays of page numbers, but it is not a sexy solution, as you first need to determine the number of pages in the document with an external library (such as 'pdfjs-dist') and then create the arrays, allowing for non-even number of pages.