ajrcarey / pdfium-render

A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
https://crates.io/crates/pdfium-render
Other
364 stars 59 forks source link

Standardise lifetime and reference handling for child objects inside PdfDocument. #47

Closed ajrcarey closed 1 year ago

ajrcarey commented 2 years ago

As of 0.7.19, the child objects inside PdfDocument are created and returned using a variety of different approaches:

Standardise the approach. All collections should create private owned collection instances inside the containing object as part of the object's new() constructor, function, then hand out mutable or immutable references as necessary. New object instances should never be handed out.

ajrcarey commented 2 years ago

Revised bookmarks, metadata, permissions, and signatures collections in time for 0.7.20.

ajrcarey commented 2 years ago

Additionally, PdfPage::boundaries() has the same problem, in that the child object supports both immutable and mutable access but does not distinguish between these access types, instead handing out a new object instance each time it is called from PdfPage::boundaries(). Reworked as PdfPage::boundaries() and PdfPage::boundaries_mut() in time for 0.7.20.

ajrcarey commented 2 years ago

The separation of PdfDocument::pages() into PdfDocument::pages() and PdfDocument::pages_mut() is the most consequential change with the potential to affect the most users. It is also likely to be the hardest to implement, depending on how many PdfPage child objects expect to be able to take a direct reference to PdfDocument. Save this for 0.7.21.

ajrcarey commented 2 years ago

On reflection, better to save a potentially breaking change like this for 0.8.0. This means that fleshing out the implementation of PdfPageFormFragmentObject should be done first, before completing this issue.

ajrcarey commented 2 years ago

Corrected a few typos in examples. Added PdfAttachment::len() function that returns the data length without requiring a buffer allocation.

ajrcarey commented 2 years ago

Added bindings for all remaining FPDFDest_*() and FPDFLink_*() functions. Implemented native, static, and thread-safe bindings; WASM implementations still pending. Should be ready for inclusion in crate version 0.7.21.

ajrcarey commented 2 years ago

Completed WASM bindings for release 0.7.21.

ajrcarey commented 2 years ago

Fixed a bug in the WASM implementation of FPDFAnnot_GetQuadPoints(). Updated README.md.

ajrcarey commented 2 years ago

Removed unnecessary &mut bindings from various rendering function declarations in PdfBitmap.

ajrcarey commented 2 years ago

Moved commentary about interface differences between native and WASM builds out of README.md into examples/README.md.

ajrcarey commented 2 years ago

Improved implementation of PdfPageImageObject::set_image(). Adjusted signatures of all functions in PdfPageImageObject that take image::DynamicImage so they take a reference to the DynamicImage rather than taking ownership of it. Updated examples/image.rs.

ajrcarey commented 2 years ago

Changed return type of Page::label() from Option<&String> to more idiomatic Option<&str>. Noticed that bindgen no longer generates a type definition for size_t; included this directly in the bindgen module defined in lib.rs.

ajrcarey commented 1 year ago

Updated examples/thread_safe.rs to clearly separate render configuration from render invocation.

ajrcarey commented 1 year ago

67 deprecates PdfPages::delete_page_at_index() and PdfPages::delete_page_range(). However, if PdfDocument::pages() were to return a &PdfPages / &mut PdfPages reference (rather than the owned PdfPages instance it returns at the moment), and if PdfPages::get() were to return a &PdfPage / &mut PdfPage reference (rather than the owned PdfPage instance it returns at the moment), then it might be safe to reinstate these functions, as Rust would be able to manage the lifetimes safely.

This would require this issue to not only split PdfDocument::pages() -> PdfPages into PdfDocument::pages() -> &PdfPages and PdfDocument::pages_mut() -> &mut PdfPages, but also split PdfPages::get() -> PdfPage into PdfPages::get() -> &PdfPage and PdfPages::get_mut() -> &mut PdfPage.

ajrcarey commented 1 year ago

Added PdfPage::links() -> &PdfPageLinks and PdfPage::links_mut() -> &mut PdfPageLinks as part of #68.

ajrcarey commented 1 year ago

Completing implementation of reading form fields in #76 allows us to proceed with crate version 0.8.0.

ajrcarey commented 1 year ago

Implemented split of PdfDocument::pages() -> PdfPages into PdfDocument::pages() -> &PdfPages and PdfDocument::pages_mut() -> &mut PdfPages. Reworked internal handle() functions to return direct FPDF_* objects rather than object references, to improve cache locality on modern CPUs. Tested and updated all examples. Updated documentation and README.md.

ajrcarey commented 1 year ago

Early experiments with splitting PdfPages::get() -> PdfPage into PdfPages::get() -> &PdfPage and PdfPages::get_mut() -> &mut PdfPage suggest that the management of lifetimes may be intractable. A hashmap can be used to store the FPDF_PAGE pointers and corresponding PdfPage instances, but returning a reference from that hashmap - even an immutable reference - by using interior mutability in PdfPages::get() is not possible because the Ref<> guard wrapping the item returned from the hashmap is a temporary value that is immediately dropped when returning from PdfPage::get().

(Interior mutability in PdfPage seems necessary because we certainly don't want PdfPage::get() to require &mut self. Alternatively, we might be able to use a separate PdfPageCache, a la PdfPageIndexCache, but that seems like a lot of additional complexity.)

ajrcarey commented 1 year ago

Clearly, splitting PdfPages::get() -> PdfPage into PdfPages::get() -> &PdfPage and PdfPage::get_mut() -> &mut PdfPage is a considerably more complex piece of work than splitting PdfDocument::pages() -> PdfPages into PdfDocument::pages() -> &PdfPages and PdfDocument::pages_mut() -> &mut PdfPages. Even if splitting PdfPages::get() is possible, it's a lot of change to package as a single release. Let's release the PdfDocument::pages() change as 0.8.0, and assess separately whether splitting PdfPages::get() is worth doing in a separate release.

ajrcarey commented 1 year ago

Released as crate version 0.8.0.