ajrcarey / pdfium-render

A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
https://crates.io/crates/pdfium-render
Other
341 stars 52 forks source link

Expose FPDF_GetPageSizeByIndexF and cache DynamicPdfiumBindings #141

Closed DorianRudolph closed 3 months ago

DorianRudolph commented 5 months ago

First, sorry for combining effectively two PRs into one, but exposing FPDF_GetPageSizeByIndexF also requires changing the bindings.

FPDF_GetPageSizeByIndexF is much faster than loading each page and then obtaining its dimensions. For a large book with hundreds of pages, FPDF_GetPageSizeByIndexF only takes a fraction of a second vs several seconds.

I also noticed that the DynamicPdfiumBindings reload the symbol from libloading for every call. It is much more efficient to store the function pointers in the struct. This should bring the overhead in line with "normal" dynamic linking. Taking the function pointers out of the symbol is "unsafe" but it is fine as long as their lifetime does not exceed that of the Library (see https://users.rust-lang.org/t/multiple-objects-with-interdependent-lifetimes-in-the-same-struct/16507/3). I created the symbol table with the following script: https://gist.github.com/DorianRudolph/1be46b909764faa5559673936342bc3f

Lastly, you can also use the StaticPdfiumBindings with dynamic linking, for which I added the PDFIUM_DYNAMIC_LIB_PATH environment variable to the build.rs. This does not require any changes besides linker flags.

DorianRudolph commented 5 months ago

I don't know why the tests fail, they run fine locally.

ajrcarey commented 4 months ago

Thank you so much @DorianRudolph , this is superb work. Please accept my apologies for the delay in replying, I have some deadlines on some other projects that are distracting me from pdfium-render.

I will reply more fully later, but I really like the idea of caching the dynamic function pointers to improve performance - thank you for doing this.

If you're interested in figuring out why the tests are failing in github, you can reproduce the github action workflow locally (at least on a linux machine) by following the instructions at the top of https://github.com/ajrcarey/pdfium-render/blob/master/.github/workflows/build_test.yml.

DorianRudolph commented 4 months ago

No worries. The tests seem to be an unrelated issue: https://github.com/ajrcarey/pdfium-render/pull/143

ajrcarey commented 3 months ago

Thanks again for preparing this change, @DorianRudolph . I hope to get it merged over the weekend. Question: for the PDFIUM_DYNAMIC_LIB_PATH, does the value of the variable actually matter? I noticed in the build.rs script that you don't seem to actually pass the value of the variable to rustc.

DorianRudolph commented 3 months ago

The path is given here:

    } else if let Ok(path) = std::env::var("PDFIUM_DYNAMIC_LIB_PATH") {
        // Instruct cargo to dynamically link the given library during the build.
        println!("cargo:rustc-link-lib=dylib=pdfium");
        println!("cargo:rustc-link-search=native={}", path); // <--
    }
ajrcarey commented 3 months ago

Merged changes. Renamed PdfPages::get_size() to page_size() and PdfPages::get_page_sizes() to page_sizes() for consistency with Page::page_size(). Removed new PdfSize struct in favour of existing PdfRect for consistency with Page::page_size(). Simplified implementation of PdfPages::get_page_sizes(). Added doc comments. Added unit test coverage. Updated README. Ready to release as crate version 0.8.22.