ajrcarey / pdfium-render

A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
https://crates.io/crates/pdfium-render
Other
364 stars 59 forks source link

Long single page PDF doesn't render at higher resolution #80

Closed reyjexter closed 1 year ago

reyjexter commented 1 year ago

We have a use case which when trying to render a very long single page PDF on WASM, it doesn't show an image. Here's an example:

https://github.com/reyjexter/pdfium-render-wasm/blob/master/www/index.html

When reducing the size or resolution of image being rendered, this renders correctly so it looks like a memory related issue. What's the best way for pdfium-render to increase the allocated memory?

The expected result is to render the whole page as an image but it instead render a blank image. On some tests that we made, it renders the top portion of the page and the remaining being cut off. There's also no error that appears on console.

image

Thanks again!

ajrcarey commented 1 year ago

Hi @reyjexter , there is no way that I am aware of to control Pdfium's memory utilisation.

The Pdfium WASM module you use must be compiled with a growable memory heap. There are notes on this in the documentation.

Please provide the sample document you are trying to render if you wish me to attempt to reproduce the problem.

reyjexter commented 1 year ago

Here's the example PDF that we are trying to render:

https://github.com/reyjexter/pdfium-render-wasm/blob/master/www/long-content.pdf

The size in pixels at 300 ppi is approximately:

const width = 7410; const height = 84699;

ajrcarey commented 1 year ago

Ok. I won't ask why you are attempting to render at those resolutions; I assume you have your reasons.

There are two separate problems here, one (arguably) to do with Pdfium and pdfium-render, the other browser-specific.

Pixel dimensions in pdfium-render are currently expressed as u16 values (i.e. the maximum is 65536 pixels). This is because Pdfium's measurements are expressed in c_int; Rust's u32 would overflow c_int on most platforms. I'm not sure why I didn't choose i32, which would be a closer match than u16; we could probably revisit this decision. As things currently stand, however, your request for a height of 84699 pixels will never succeed because it overflows a u16 value. The wrapped value received by Rust is actually 19163.

Even with a width of 7410 pixels and a height of "only" 19163 pixels, however, the rendered image is 567 megabytes in size. This brings us to your second problem: Safari and Firefox appear to have upper bounds on the maximum size of an ImageData object. (Safari's limit may be as low as 64 Mb.) Neither browser will render an ImageData object of this size.

I was able to successfully render the 567 Mb ImageData using Chromium 112.0.5615.49.

Long story short, the problem is not Pdfium per se (although certainly the choice of the Pixels unit datatype in pdfium-render could be revisited), but rather that your target dimensions are pushing the boundaries of what browsers will accept.

ajrcarey commented 1 year ago

Changed PdfBitmap::Pixels datatype definition from u16 to c_int for purposes of testing. Further experimentation suggests that even Chromium has difficulty rendering an ImageData larger than 1 Gb, and that Pdfium itself cannot allocate memory for a bitmap larger than 2 Gb in size. This may be a WASM-specific limitation, however; I will experiment further to see if the same restriction applies to non-WASM binaries.

In short, it does appear that 65536 pixels is a sensible upper bound for bitmap dimensions.

ajrcarey commented 1 year ago

Testing rendering of the same document on a Linux machine, the maximum bitmap I was able to allocate was 2320723080 bytes in size (2.16 Gb). This seems to suggest that just over 2 Gb is a hard limit, irrespective of the platform. A bitmap that size correponds to pixel dimensions of 7410 x 78297.

You will need to reconsider your target resolution, or file a bug report upstream with the Pdfium authors.

ajrcarey commented 1 year ago

Changed PdfBitmap::Pixels datatype definition from u16 to c_int permanently. Added notes on bitmap size limits to inline documentation on the Pixels datatype. Added PdfBitmap::bytes_required_for_size() function to assist with calculating the buffer size required for a bitmap. Updated README.md. Changes will be included in crate release 0.8.1.

reyjexter commented 1 year ago

Thanks for the detailed explanation and yes this is something that we are also considering which is to use lower resolution for some documents.