Closed NyxCode closed 2 years ago
I tracked my performance issues down to pdfium-render::PdfiumRenderWasmState::copy_bytes_from_pdfium
.
For a 2000x2000 render, it takes a whopping 10ms.
Hi Nyx,
Thanks for raising the issue and I'm pleased someone else is playing around with Pdfium in WASM - I thought I might be the only person using it in the browser!
The distribution of Pdfium as a self-contained WASM module has a big impact on performance. So long as Pdfium and your Rust application are separate WASM modules, they have no safe way to directly share memory - every memory access has to be copied across to Pdfium, and every result generated by Pdfium has to be copied back again into the WASM module containing pdfium-render and your Rust application (this is what you're seeing in the calls to the pdfium-render::PdfiumRenderWasmState::copy_bytes_from_pdfium()
function).
In theory both Pdfium's WASM module and the WASM module for your Rust application could share a single linear WASM memory address space, but because Pdfium (wrapped by emscripten) and your Rust application (wrapped by wasm-bindgen) have independent memory allocators it seems incredibly risky to me to try to share memory - a memory allocation in pdfium-render could inadvertently overwrite memory used by Pdfium, and vice versa. So I have deliberately kept the WASM memory address spaces separate and isolated to each module. This is memory safe, but does have the drawback that all memory allocations have to be copied from one address space to another. It also means your suggestion about using a mutable buffer won't work, because while the buffer might be mutable within pdfium-render's WASM linear memory, any changes made to that buffer would not be visible to Pdfium, since it is using a completely separate WASM memory space.
I'm open to suggestions about how to improve this approach without compromising memory safety. Regarding the pdfium-render::PdfiumRenderWasmState::copy_bytes_from_pdfium()
function specifically, there is some debugging code in there to check the integrity of the copied memory slice, and that check takes linear time. It is only used when debugging checks are turned on. (See https://github.com/ajrcarey/pdfium-render/blob/master/src/wasm.rs#L496.) Can I confirm that when you initialize pdfium-render from your Javascript code, you are not using the debugging flag, i.e. you are initializing pdfium-render using something like this:
initialize_pdfium_render(
Module, // Emscripten-wrapped Pdfium WASM module
rustModule, // wasm_bindgen-wrapped WASM module built from our Rust application
false, // Debugging flag - set this to true to get tracing information logged to the Javascript console
),
with the debugging flag set to false
rather than true
?
Hey, appreciate the quick response!
confirm that when you initialize pdfium-render from your Javascript code, you are not using the debugging flag
I had that flag set to false
. When enabling it, it goes from spending ~10ms to ~230ms in copy_bytes_from_pdfium
.
a memory allocation in pdfium-render could inadvertently overwrite memory used by Pdfium, and vice versa.
~~I'm not sure I fully understand this yet. When rendering a PDF, pdfium allocates a buffer with its allocator, and FPDFBitmap_GetBuffer
then returns a pointer to that buffer, which should be valid until the bitmap is free'ed.
So why cant this pointer be used directly? Besides using it after the bitmap is free'ed, when would this pointer become invalid?~~
Here is the very crude profiling i did: What you see here are the stats for rendering a page to a 2000x2000 canvas. So the 26.5ms is everything from start to PDF on screen.
copying the buffer takes ~35%, while get_bitmap_with_config
takes another ~35%.
My original intention was to get rid of the cost of creating a new buffer for every render by re-using one big buffer for all pages/bitmaps. So that concerns get_bitmap_with_config
.
If we could also get rid of the copying, this would be blazingly fast :wink:
Okay, I finally understood :smile:
As an alternative approach, I tried to directly work with views into Pdfium's heap. Since I don't plan on doing any processing on the rendered bitmap, that works for me.
With that approach, got it from 26.5ms to 17.2ms by avoiding the copy.
Instead of using as_bytes()
to get a copied &[u8]
, I added as_uint8clampedarray
which returns a Uint8ClampedArray
directly. I get that with Uint8ClampedArray::new(&self.heap_u8().buffer())
and then calling subarray(..)
on it.
Here's the breakdown:
Ok, that's good. I'd be open to adding a WASM-specific PdfBitmap::as_uint_8_clamped_array()
or similar to account for this specific situation.
Looking at your flame graph, I'm interested in why get_bitmap_with_config()
is taking up such a large portion of runtime; that seems very high to me. Would you be willing to share a sample from your source code so I can trace exactly what's going on?
sure! let me first do some cleanup though.
@ajrcarey I've pushed my experiment here. it depends on my branch which implements bitmap.as_uint8array
. You can find the JS in pdfoxide-svelte/src/routes/index.svelte
Brilliant, thank you so much, I'm going to take a detailed look at this tomorrow.
Ok, the PR all looks good, I might rename a few things to make it more obvious this is not part of the Pdfium API but the basic approach looks good. I'm going to keep PdfBitmap::as_image_data()
but I will flesh out the documentation to make it clear it is slower than using PdfBitmap::as_uint8_array()
. I'll package this up and release it as 0.7.11 today or tomorrow. Thank you again for the PR!
Renamed FPDFBitmap_GetBuffer_Uint8Array()
in bindings to FPDFBitmap_GetArray()
to more clearly separate retrieving a buffer from a JS array. Renamed PdfBitmap::as_uint8_array()
to PdfBitmap::as_array()
. Expanded documentation to make performance difference between PdfBitmap::as_array()
and PdfBitmap::as_image_data()
explicit. Adjusted PdfBitmapConfig::apply_to_page()
to avoid an unnecessary vector allocation and iteration when no changes have been made to the default form field highlighting settings. Updated documentation.
Published to crates.io as version 0.7.11.
@ajrcarey awesome! I'll open a separate issue to explore if we can make create_empty_bitmap_handle
faster.
Hey! Great crate, had a blast playing around with it!
I tried to beat PDF.js for rendering a page to a canvas, but I can't seem to get the performance any better than PDF.js.
It seems like it always creates a new buffer when calling the
get_bitmap_*
functions. Would it be possible to introduce a variant which takes a mutable reference to a buffer and renders to that? I am not sure how much performance is left on the table here, but it's probably not nothing.