ajrcarey / pdfium-render

A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
https://crates.io/crates/pdfium-render
Other
362 stars 59 forks source link

Keeping a global, open set of PdfDocuments around #94

Closed Lampyrida closed 1 year ago

Lampyrida commented 1 year ago

Hello! Back with another question.

Building on issue #59, I've been trying to get the following to work:

[...] then the next step would be to try to get that PdfDocument reference into a static cell as well. This would save you from repeatedly opening and closing your document, which I think is likely to be the biggest source of noticeable performance lag.

I tried setting this up as follows (with the sync and thread_safe features enabled):

static PDFIUM: once_cell::sync::OnceCell<Pdfium> = once_cell::sync::OnceCell::new();
static PDF_DOCUMENTS: once_cell::sync::OnceCell<Vec<PdfDocument<'static>>> = once_cell::sync::OnceCell::new();

However, this gives me the following error:

`*mut pdfium_render::bindgen::fpdf_document_t__` cannot be shared between threads safely
within `pdfium_render::document::PdfDocument<'static>`, the trait `Sync` is not implemented for `*mut pdfium_render::bindgen::fpdf_document_t__`
required for `Unique<pdfium_render::document::PdfDocument<'static>>` to implement `Sync`
required for `once_cell::imp::OnceCell<Vec<pdfium_render::document::PdfDocument<'static>>>` to implement `Sync`

fpdf_document_t__ implements !Send and !Sync.

My next attempt was to use thread local storage to work around this:

thread_local! {
    static PDFIUM: once_cell::unsync::OnceCell<Pdfium> = once_cell::unsync::OnceCell::new();
    static PDF_DOCUMENTS: once_cell::unsync::OnceCell<Vec<PdfDocument<'static>>> = once_cell::unsync::OnceCell::new();
}

The issue I run into here is that I can't figure out how to get the lifetimes to play nice when going through the with accessor:

fn test() {
    PDFIUM.with(|pdfium| {
        let p = pdfium.get().unwrap();
        PDF_DOCUMENTS.with(|pdf_documents| {
            let mut pd = pdf_documents.borrow_mut();
            pd.push(p.load_pdf_from_byte_vec(vec![], None).unwrap());
        })
    })
}

This gives the following error:

error[E0521]: borrowed data escapes outside of closure
   --> src/lib.rs:499:17
    |
498 |     PDFIUM.with(|pdfium| {
    |                  ------
    |                  |
    |                  `pdfium` is a reference that is only valid in the closure body
    |                  has type `&'1 once_cell::unsync::OnceCell<pdfium_render::pdfium::Pdfium>`
499 |         let p = pdfium.get().unwrap();
    |                 ^^^^^^^^^^^^
    |                 |
    |                 `pdfium` escapes the closure body here
    |                 argument requires that `'1` must outlive `'static`

I think this makes sense. Within the context of the with closure, pdfium is just a local reference. But I then try to obtain a PdfDocument<'static> from it. It's possible I should be able to get the thread local storage version to work, but I can't figure out how to get the lifetimes to work out.

Would appreciate any input/advice here!

ajrcarey commented 1 year ago

fpdf_document_t__ implements !Send and !Sync.

Where? I don't see that anywhere. It does not implement Send or Sync, but that is not the same thing as deliberately implementing !Send and!Sync.

If you wanted to experiment with this further, you could add implementations for Send andSync for PdfDocument in document.rs. This may allow you to progress your first attempt.

Be aware that I have no time to work on this before the release of 0.9.0.

Lampyrida commented 1 year ago

Thanks for the quick reply @ajrcarey. Upon digging a bit more, I might have misstated that a bit. The issue is that raw pointers do not implement Send and Sync by default. PdfDocument contains a *mut fpdf_document_t__ which therefore means that PdfDocument does not implement Send/ Sync.

As you suggested, defining unsafe impl Send for PdfDocument {} seems to work. Would it make sense to automatically include this when the sync feature is enabled?

Once more quirk I ran into as I played around with this: std::sync::OnceLock::set() or equivalently once_cell::sync::OnceCell::set() both return a Result<(), E> where E: std::fmt::Debug. Pdfium does not implement std::fmt::Debug which means you can't call unwrap() on the result. I.e., this does not compile:

static PDFIUM: std::sync::OnceLock<Pdfium> = std::sync::OnceLock::new();

fn main() {
    STATE.pdfium.set(Pdfium::new(Pdfium::bind_to_library("libpdfium.so").unwrap())).unwrap();  // Outer `unwrap()` does not work.
}

Would it make sense for Pdfium to implement std::fmt::Debug?

Thanks again, and happy to wait until some time after 0.9.0.

ajrcarey commented 1 year ago

Defining unsafe impl Send for PdfDocument {} seems to work. Would it make sense to automatically include this when the sync feature is enabled?

Yes :)

Would it make sense for Pdfium to implement std::fmt::Debug?

Also yes.

I'll try to do both of these over the weekend. In theory both are straight forward.

ajrcarey commented 1 year ago

Added Sync and Send implementations for PdfDocument when using sync crate feature. Added Debug implementations for Pdfium and PdfDocument.

If this is sufficient for your use case, let me know and I'll close the issue.

ajrcarey commented 1 year ago

As there have been no further comments and I believe your initial requirement (being able to use PdfDocument with a static cell) has now been addressed, I am closing the issue. Feel free to re-open if necessary.