Closed NyxCode closed 2 years ago
The obvious solution would be to just cache the bitmap created by get_bitmap_with_config
, and re-use it for rendering other pages.
For PDFs with varying page sizes, however, this get's a bit more tricky. Will need to think about that a bit.
I have thought a bit about how one would implement a web-based PDF viewer on top of pdfium-render. While that's certainly possible right now, a few features are missing to achieve performance similar to PDF.js:
FPDF_RenderPageBitmapWithMatrix
. With these two features, a "normal" PDF (in which every page has the same size) could be viewed using just one buffer. Zooming and panning could be done by changing the viewport only.
@ajrcarey what do you think?
Ok, I see what you're trying to do. There's two problems that I can see straight away:
pdfium-render
to size any shared bitmap buffer correctly, since it can't know the sizes of all rendering jobs in advance.For both these reasons, I don't think bitmap buffer sharing should be provided directly by pdfium-render
. The solution for your use case is to adjust pdfium-render
so it lets you "bring your own buffer" when rendering a page; that way, you can size the buffer correctly yourself (since you probably know the maximum buffer size you will need for your rendering jobs), and the responsibility for managing the lifetime of the buffer falls to you as the caller.
Specifically, I propose the following adjustments:
PdfPage::get_bitmap()
and PdfPage::get_bitmap_with_config()
to PdfPage::render()
and PdfPage::render_with_config()
- I should have done this some time ago to separate the concept of a bitmap (which is used elsewhere in Pdfium) from page rendering. They are two different things and therefore should be named differently; I did not entirely appreciate this when I first implemented the functions.PdfBitmap::empty()
constructor or similar to let you create an FPDF_BITMAP
handle from your own code.PdfPage::render_into_bitmap()
and PdfPage::render_into_bitmap_with_config()
functions that let you provide your own mutable PdfBitmap as the rendering destination - this lets you reuse the same bitmap over and over without new allocations.The FPDFBitmap_CreateEx()
function called by PdfBitmap::create_empty_bitmap_handle()
lets you pass a pointer to an external byte buffer for Pdfium to use, rather than having Pdfium create the buffer. In other words, Pdfium itself already supports the "bring your own buffer" concept, so in theory we could leverage this for your use case. In practice, however, this won't work correctly when compiling to WASM because of the isolated memory address space complication we talked about during #32. So while it's tempting to try to take advantage of this, I think it's best to avoid it for reasons of cross-platform compatibility.
For reference, the source code for the FPDFBitmap_CreateEx()
function is at https://pdfium.googlesource.com/pdfium/+/refs/heads/main/fpdfsdk/fpdf_view.cpp#806. I don't have profiling data for it but I would assume it's spending most of its time in FPDFBitmapFromCFXDIBitmap()
.
Ah, I see we both reached the same conclusion of separating page rendering from bitmaps independently, and at almost precisely the same time! Great minds think alike :)
Work with bitmaps independently of pages: I believe this is covered by my proposed adjustments above. I have the feeling you have reached the same solution.
Support viewports: this sounds like additional functionality that could be added to PdfBitmapConfig.
I think it's possible to address both of these. Let's deal with item 1 first.
😄 Yeah, I like the API scetch you've laid out above. I'll have to dive a bit deeper into the crate to see where everything would fit in.
For example, you already have a distinction between PdfBitmapConfig
and PdfBitmapRenderSettings
. I feel like it would make sense to keep these two, but make PdfBitmapRenderSettings
public.
Then, you'd either do PdfPage::render_to_bitmap_with_config(&mut bitmap, PdfBitmapRenderSettings::new())
or let pdfium-render create the bitmap for you with PdfPage::render_with_config(PdfBitmapConfig::new(), PdfBitmapRenderSettings::new())
.
PdfBitmapConfig
is the publically visible API. It's effectively a builder pattern, with PdfBitmapRenderSettings
being the private built artifact. I think the builder pattern works ok here and my instinct is the built artifact should remain private, but the naming is problematic; ideally it would be PdfRenderConfig
(or even PdfPageRenderConfig
, since it's specific to PdfPage
) rather than PdfBitmapConfig
.
The basic idea I have is:
PdfPage::render_into_bitmap_with_config()
: takes both a previously created PdfBitmap
and a PdfBitmapConfig
, generates the PdfBitmapRenderSettings
from the PdfBitmapConfig
, and renders the page into the provided bitmap using those render settings.PdfPage::render_into_bitmap()
: takes a previously created PdfBitmap
, creates a default PdfBitmapConfig
, then calls render_into_bitmap_with_config()
with those values.PdfPage::render_with_config()
: takes a PdfBitmapConfig
, creates a new empty PdfBitmap
, then calls render_into_bitmap_with_config()
with those values.PdfPage::render()
: creates a default PdfBitmapConfig
and a new empty PdfBitmap
, then calls render_into_bitmap_with_config()
with those values.I should have time to code this up tomorrow. I will then look into the FPDF_RenderPageBitmapWithMatrix()
function and see how we can integrate that cleanly.
Sounds good!
If there's something i could help with, please let me know.
Also, I'd encourage you to not be hesitant to break the public API. Now's the time to do that, it only get's more difficult once a lot of people depend on your crate.
Ok, I've pushed some changes that adds the option of re-using an existing PdfBitmap
when rendering a page. I've renamed some functions in the public API (deprecating the old names as part of tracking issue #36) and added the new PdfPage::render_into_bitmap()
and PdfPage::render_into_bitmap_with_config()
functions we discussed yesterday. Here's an example of using the new render_into_bitmap_with_config()
function:
fn test_page_rendering_reusing_bitmap() -> Result<(), PdfiumError> {
// Renders each page in the given test PDF file to a separate JPEG file
// by re-using the same bitmap buffer for each render.
let pdfium = Pdfium::new(
Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("./"))
.or_else(|_| Pdfium::bind_to_system_library())?,
);
let document = pdfium.load_pdf_from_file("./test/export-test.pdf", None)?;
let render_config = PdfRenderConfig::new()
.set_target_width(2000)
.set_maximum_height(2000)
.rotate_if_landscape(PdfBitmapRotation::Degrees90, true);
let mut bitmap = PdfBitmap::empty(
2500, // Deliberately larger than any page will be
2500,
PdfBitmapFormat::default(),
pdfium.get_bindings(),
)?;
for (index, page) in document.pages().iter().enumerate() {
page.render_into_bitmap_with_config(&mut bitmap, &render_config)?; // Re-uses the same bitmap for rendering each page.
bitmap
.as_image()
.as_rgba8()
.ok_or(PdfiumError::ImageError)?
.save_with_format(format!("test-page-{}.jpg", index), image::ImageFormat::Jpeg)
.map_err(|_| PdfiumError::ImageError)?;
}
Ok(())
}
This example creates a bitmap that is deliberately larger than any of the rendered pages will be. The saved images include the entire bitmap dimensions, not just the portion used for rendering each page; this makes perfect sense, but is perhaps not the ideal (or expected) result. Does the resulting image need to be clipped to the rendered dimensions?
FPDF_RenderPageBitmapWithMatrix()
takes a standard six-value transformation matrix to allow translating, scaling, rotating, and shearing the page image during rendering. It should be possible to support all these transform operations as part of PdfRenderConfig
using an interface similar to that already in place as part of PdfPageObject
, where these transform operations are already supported.
The documentation for FPDF_RenderPageBitmapWithMatrix()
says that the matrix must be "invertible"; I don't recall that requirement being mentioned when transforming page objects, and I'm not sure what the implications are off the top of my head.
FPDF_RenderPageBitmapWithMatrix()
also supports clipping during rendering, which can likewise be supported via PdfRenderConfig
.
@ajrcarey Awesome, I'll check that out tonight!
A matrix is reversible if the linear transformation is reversible - The only case I can think of is if you'd scale everything down to a point or line. Every rotation and every translation is reversible, and every scaling unless the scale for either axis is 0.
That was my thought as well - that every transform operation supported by the matrix is inherently invertible. It seemed odd to me that they would mention it at all. Maybe I'm missing something. Perhaps it will become apparent when adding the transform operations to PdfRenderConfig
.
@ajrcarey How about exposing the transformation matrix more or less directly, instead of sticking with the fields in PdfRenderConfig?
That would allow users to chain transformations together, like Transform::new().scale(2.0).rotate(3.14).translate(100.0, 200.0)
.
The API could look something like this:
#[derive(Copy, Clone)]
pub struct Transform([[f64; 3]; 3]);
impl Transform {
pub fn new() -> Self {
Self([
[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
[0.0, 0.0, 1.0]
])
}
pub fn scale(self, scale: f64) -> Self {
self.scale_axis(scale, scale)
}
pub fn scale_axis(self, scale_x: f64, scale_y: f64) -> Self {
let transform = Transform([
[scale_x, 0.0, 0.0],
[0.0, scale_y, 0.0],
[0.0, 0.0, 1.0]
]);
self * transform
}
pub fn translate(self, x: f64, y: f64) -> Self {
let transform = Transform([
[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
[x, y, 1.0]
]);
self * transform
}
pub fn rotate(self, rad: f64) -> Self {
let sin = rad.sin();
let cos = rad.cos();
let transform = Transform([
[cos, sin, 0.0],
[-sin, cos, 0.0],
[0.0, 0.0, 1.0]
]);
self * transform
}
}
impl Mul<Transform> for Transform {
type Output = Transform;
fn mul(self, rhs: Transform) -> Self {
let Self(l) = self;
let Self(r) = rhs;
let mut result = [[0f64; 3]; 3];
for row in 0..3 {
for col in 0..3 {
for k in 0..3 {
result[row][col] += l[row][k] * r[k][col];
}
}
}
Transform(result)
}
}
Yes, that looks good. I like your function chaining approach.
If we were to introduce the ability to pass a Transform
into each of the PdfPage::render()
functions, then we would be breaking the public API. I'm happy to do that if it's necessary, but I don't think in this case it is. PdfRenderConfig
already supports function chaining via the builder pattern, so if the transformation functions are added to that struct, the public API does not need to change. It also makes sense to me to group all the configuration functionality together.
You suggested exposing the transformation matrix separately from PdfRenderConfig
. Why would that be a better approach, in your opinion?
I've push a commit that adds the new functionality to PdfRenderConfig
. The API of the transformation functions matches that already in place for PdfPageObject
; this is important so that pdfium-render
's API feels coherent.
Pdfium's API offers two rendering functions: FPDF_RenderPageBitmapWithMatrix()
that takes a transformation matrix and a clipping region - which is what you're interested in - and FPDF_RenderPageBitmap()
that doesn't. The latter function has been the only one used by pdfium-render
up until now. The rendering of form elements and form data requires two passes and is dependent on the latter function; it is not possible to render form elements and form data while simultaneously applying a transformation matrix. (Well, it is, but the matrix will be applied when rendering the page elements but then ignored when rendering the form elements and form data, so the generated bitmap will be useless.) pdfium-render
will transparently switch between the two rendering functions depending on whether PdfRenderConfig
indicates form data is to be rendered; using any of PdfRenderConfig
's transformation functions automatically turns off rendering of form data. The documentation attempts to make all this clear, but it is worth knowing now, before you get started. If you must render form data and form elements while simultaneously applying a transformation matrix, the only option I can see is to use PdfPage::flatten()
to first flatten the form data and form elements into the page itself. The page can then be rendered using a transformation matrix. This permanently bakes those form elements into the page, however.
I have added the option to specify a custom clipping area via the PdfRenderConfig::clip()
function.
There's a lot of new functionality here and so it's good to have you test at least the bits you're interested in before I push a public release. I have already dealt with some corner cases in PdfRenderConfig
, and I have checked to make sure that the new additions to PdfRenderConfig
don't break any existing rendering tests, so I'm reasonably confident it's all backwards-compatible.
Confirmed all existing tests and examples work when compiling to both native and WASM. Updated documentation.
I'm reasonably confident this is all backwards-compatible and as such I'm happy to upload it to crates.io as release 0.7.13. Waiting for a few days to see if @NyxCode has any feedback / has discovered any bugs that require resolution before publishing.
@ajrcarey Awesome!
I like the API, and I haven't discovered any bugs yet.
I'm currently setting up a web-based PDF viewer to properly test everything out. Will need a few more days to get that up and running. I think you can publish now if you want, and in case we discovered any issues, just publish a patch.
If you're not in a hurry, you can also wait a few days until I get everything up and running
No hurry here, take your time. I'm away until next Wednesday and won't publish anything until I get back.
@ajrcarey small update:
I've put together the foundation of a web-based pdf viewer here. Will implement performance optimizations (only render parts of a page, re-use bitmaps, etc.) in the next few days.
@ajrcarey re-using the bitmap works great, and speeds up the rendering process a lot.
Only constructing it for my use-case is a bit messy - though that's probably not an issue with the API.
Here is how that looks for my usecase right now. A lot of stuff is &'static
since that makes using wasm-bindgen much easier.
let bindings = Pdfium::bind_to_system_library().map_err(js_err)?;
let pdfium: &'static Pdfium = Box::leak(Box::new(Pdfium::new(bindings)));
let bitmap: PdfBitmap<'static> =
PdfBitmap::empty(max_width, max_height, PdfBitmapFormat::default(), pdfium.get_bindings())
.map_err(js_err)?;
let bitmap = Rc::new(UnsafeCell::new(bitmap));
Ok, I'm pleased it's all working well. I'm going to release this as 0.7.13 momentarily.
Is the use of Box::leak(Box::new(...))
in your snippet because you're trying to erase the lifetime? I'm wondering why you don't just store an owned Pdfium struct somewhere and pass around a reference to it as you need it, but no doubt there's a good reason.
@ajrcarey Exactly, that was the intent since wasm-bindgen
can't generate bindings for structs with lifetimes. I don't see a good alternative to this which does not involve managing stuff manually with *mut
pointers everywhere.
Ah, right. So you want to pass PdfBitmap<'a>
(and perhaps other pdfium-render lifetime-bound objects) back and forth between Javascript and WASM. Hmm. Yes, that's tricky. All lifetimes in pdfium-render are fundamentally bound to the lifetime of PdfiumLibraryBindings
, so if that lifetime can be reduced to 'static
then all other objects would also have static lifetimes. I wonder if there's a convenient way of doing that.
All lifetimes in pdfium-render are fundamentally bound to the lifetime of PdfiumLibraryBindings
Additionally, PdfPage<'a>
borrows from &'a PdfDocument
, right?
Yes, but the lifetime of PdfDocument<'a>
is itself bound to PdfiumLibraryBindings
. I suppose it depends on whether the borrow checker is smart enough to realise <'a> in PdfPage
and PdfDocument
are ultimately both referring to the same lifetime; my guess is that it probably can figure that out, since there is a coherent chain of lifetimes that all end at PdfiumLibraryBindings
.
Super impressed by your demo at https://svelte-pdf-viewer.nyxcode.com/, by the way, @NyxCode! Looks and feels terrific.
@ajrcarey thanks!
get_bitmap_with_config
is quite slow on WASM. For a 2000x2000 bitmap, it takes ~11ms on my machine.Most of that time is spent in
create_empty_bitmap_handle
. I will explore what optimizations can be done to improve this, especially on WASM.After #32, which made the actual rendering process much faster, here is the crude flamegraph I generated.
Unlike the graphs in #32, this test is more realistic, rendering a real page with a lot of content:
"TOTAL" is the whole rendering process, starting with getting a specific page from a document, and ending with drawing the bitmap to a canvas.