Closed hhio618 closed 1 year ago
Hi @hhio618, good to hear from you.
The impact of recreating the library bindings on each call should be minimal (i.e. milliseconds). However, if you want to get it as close to zero as possible, I suggest one of the following approaches:
PdfiumLibraryBindings
struct for a statically-bound Pdfium library is a zero-time operation.PdfiumLibraryBindings
object instance once, then store it in whatever state container you are currently using (presumably created by lazy_static
or similar). You can then access it each time you need to make a call to a Pdfium function.If you are not currently storing any shared state in your Rust code, then statically binding is the way to go.
I don't have a specific long-form example of this but if you give me an overview of your set of functions (doesn't need to be the entire code), I can write up a sketch for you of how option 2 would look.
PS if you are rendering/manipulating content from the same PDF document over and over again, then there is a performance impact associated with repeatedly opening and closing the PDF document. This will be far more noticeable than loading the library bindings on every call and I would focus your attention on this area. You want to hold onto your PDF objects (document, page, etc.) for as long as you can if you want maximum performance.
Hi @ajrcarey, Thanks for the info!
I don't have a specific long-form example of this but if you give me an overview of your set of functions (doesn't need to be the entire code), I can write up a sketch for you of how option 2 would look.
Sure, this is what I'm currently doing:
use std::{
env,
io::{Read, Seek},
path::PathBuf,
};
use image::DynamicImage;
use pdfium_render::prelude::*;
pub enum PDFQuality {
High,
Medium,
Low,
}
fn initialize_pdfium() -> Box<dyn PdfiumLibraryBindings> {
let out_path = env!("OUT_DIR");
let pdfium_lib_path =
PathBuf::from(&out_path).join(Pdfium::pdfium_platform_library_name());
let bindings = Pdfium::bind_to_library(
#[cfg(target_os = "android")]
Pdfium::pdfium_platform_library_name_at_path("./"),
#[cfg(not(target_os = "android"))]
pdfium_lib_path.to_str().unwrap(),
)
.or_else(|_| Pdfium::bind_to_system_library());
match bindings {
Ok(binding) => binding,
Err(e) => {
panic!("{:?}", e)
}
}
}
pub fn render_preview_page<R>(data: R, quailty: PDFQuality) -> DynamicImage
where
R: Read + Seek + 'static,
{
let render_cfg = PdfBitmapConfig::new();
let render_cfg = match quailty {
PDFQuality::High => render_cfg.set_target_width(2000),
PDFQuality::Medium => render_cfg,
PDFQuality::Low => render_cfg.thumbnail(50),
}
.rotate_if_landscape(PdfBitmapRotation::Degrees90, true);
Pdfium::new(initialize_pdfium())
.load_pdf_from_reader(data, None)
.unwrap()
.pages()
.get(0)
.unwrap()
.get_bitmap_with_config(&render_cfg)
.unwrap()
.as_image()
}
Thank you for the sample, that's excellent.
So, the time cost here is the call to load_pdf_from_reader()
, which is called every time your calling code invokes render_review_page()
. Because your PdfiumLibraryBindings
are being instantiated every time render_review_page()
is invoked, the call to load_pdf_from_reader()
must reinflate the in-memory representation of the PDF document you want to render. Depending on the size of the document, this can take a noticeable amount of time. My suggestion is that you try to focus on avoiding reloading the document.
initialize_pdfium()
just once, and cache the result in a lazy_static
. (If you run into problems with lifetimes, I can try to help you with that.)load_pdf_from_reader()
is a great choice here because when using a reader Pdfium is very good about (a) only loading the parts of the file it needs (much faster when working with big documents) and (b) caching things in memory. This is exactly what you want for optimal performance.initialize_pdfium()
, and you will also need a corresponding drop_pdfium()
(or similar) function that drops your cached PdfiumLibraryBindings
instance (unless you don't care about memory leaks :)This approach involves more boilerplate, because you must now introduce a lazy_static
and manage it, but it will definitely be faster. Depending on the size of your document, it may be noticeably faster.
I'm not sure how the rendered bitmap data is transferred from Rust to your calling code, but you want to avoid a copy there if at all possible. (I'm not sure how much control over that you get from JNI.) The larger the bitmap image, the more noticeable the latency introduced by a copy will be.
PS after thinking about it a bit more, you may not even need an initialise_pdfium()
function because that will take place automatically during your lazy_static
(or once_cell
, if you prefer that approach rather than lazy_static
) setup.
It's possible lifetimes might be problematic when working with these static initialisers. I am happy to help you with that.
PS after thinking about it a bit more, you may not even need an
initialise_pdfium()
function because that will take place automatically during yourlazy_static
(oronce_cell
, if you prefer that approach rather thanlazy_static
) setup.It's possible lifetimes might be problematic when working with these static initializers. I am happy to help you with that.
I ran into some problems while trying this, Would you please show me a sample snippet?
Sure. Based on your sample code, I was thinking along the lines of the following:
use image::DynamicImage;
use once_cell::sync::OnceCell;
use pdfium_render::prelude::*;
use std::{
env,
io::{Read, Seek},
path::PathBuf,
};
static PDFIUM: OnceCell<Pdfium> = OnceCell::new(); // static initializers must impl Sync + Send
pub enum PDFQuality {
High,
Medium,
Low,
}
fn initialize_pdfium() {
let out_path = env!("OUT_DIR");
let pdfium_lib_path = PathBuf::from(&out_path).join(Pdfium::pdfium_platform_library_name());
let bindings = Pdfium::bind_to_library(
#[cfg(target_os = "android")]
Pdfium::pdfium_platform_library_name_at_path("./"),
#[cfg(not(target_os = "android"))]
pdfium_lib_path.to_str().unwrap(),
)
.or_else(|_| Pdfium::bind_to_system_library())
.unwrap();
PDFIUM.set(Pdfium::new(bindings)); // Instead of returning the bindings, we cache them in the static initializer
}
pub fn render_preview_page<R>(data: R, quailty: PDFQuality) -> DynamicImage
where
R: Read + Seek + 'static,
{
let render_cfg = PdfBitmapConfig::new();
let render_cfg = match quailty {
PDFQuality::High => render_cfg.set_target_width(2000),
PDFQuality::Medium => render_cfg,
PDFQuality::Low => render_cfg.thumbnail(50),
}
.rotate_if_landscape(PdfBitmapRotation::Degrees90, true);
PDFIUM
.get() // Retrieves the previously-created Pdfium instance from the static initializer
.unwrap()
.load_pdf_from_reader(data, None)
.unwrap()
.pages()
.get(0)
.unwrap()
.get_bitmap_with_config(&render_cfg)
.unwrap()
.as_image()
}
There's a lot of unwrap()
-ping going on here, which isn't great for safety, but for a proof-of-concept I guess it's ok for now.
Creating a static instance of Pdfium
requires that struct to implement both the Sync
and Send
traits, which it currently does not do. I have made a commit that adds a new feature, sync
, that adds this. Set pdfium-render
as a git dependency in your Cargo.toml
, and activate both the sync
and thread_safe
features. You should now be able to compile the example above.
I have confirmed that the sample above compiles, which is not quite the same as confirming that it works :) Whether it is safe for the Pdfium
struct to implement Sync
and Send
is open to some debate, since Pdfium itself is not thread-safe. But this is the approach I would take to start with. pdfium-render
does marshall calls to Pdfium in a thread-safe manner, even when running in multi-threaded code, so long as the thread_safe
feature is enabled, so in theory it should work :)
If this does work without segfaulting Pdfium, and you are repeatedly reading from the same document in render_preview_page()
, then the next step would be to try to get that PdfDocument
reference into a static cell as well. This would save you from repeatedly opening and closing your document, which I think is likely to be the biggest source of noticeable performance lag.
Your sample code only reads from a document; it doesn't change the document. If you did want to change an existing document, then another way you could improve performance would be to change pdfium-render
's default approach to regeneration of content streams. (The default setting is very convenient, but not optimal for performance when making many changes to a document.) But if you're only rendering existing documents, then you won't need to worry about that.
@ajrcarey Thanks for your time, this helps us solve our performance problem!
That's great - so it does work, then? Pdfium doesn't segfault or otherwise complain?
Works great! I didn't see any problem.
Excellent. I will run some more tests here, but all going well the new sync
feature will be released in crate version 0.7.26. If I don't detect any problems in my tests, I may even make it a default feature.
Tests showed no problems in enabling sync
feature by default. Updated README.md
. Scheduled for inclusion in crate version 0.7.26.
Hi, we are working on a project that needs maximum performance for bulk pdf preview generation on an android JNI bridge. Currently, we're loading the library bindings on every call to the library and we're not sure if it's performance efficient. Could you please give me an example demonstrating loading the library once and then using it on each call? https://github.com/ARK-Builders/ARK-Navigator/pull/271