ajrcarey / pdfium-render

A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
https://crates.io/crates/pdfium-render
Other
364 stars 59 forks source link

Issues involving flip_vertically/horizontally #93

Closed Lampyrida closed 1 year ago

Lampyrida commented 1 year ago

Hey there!

I was playing around with the flip_horizontally and flip_vertically functions from PdfPages. When I first tried using these on a particular page and then render the page right after, I would consistently get a blank page.

Minimal reproducing code looks like:

let pdf_document = pdfium.load_pdf_from_file(".....", None).unwrap();
let mut pdf_page = pdf_document.pages_mut().get(0).unwrap();
pdf_page.flip_horizontally().unwrap(); // Or flip_vertically

// let pdf_page = pdf_document.pages().get(0).unwrap();   // Required to get the render call below to pick up the transformation.

let width_in = pdf_page.width().to_inches();
let width_px = width_in * 100.0; // 100 DPI
let config = PdfRenderConfig::new().set_target_width(width_px.round() as Pixels);
let image = pdf_page.render_with_config(&config).unwrap().as_image();
image.save("....png").unwrap();

I spent a bunch of time digging into and uncovered two issues here. It's probably worth generating separate issues for these but I figured I'd start with a single one to explain how I ran into this.

  1. The first issue is that flip_horizontally/flip_vertically seem to mirror the entire page and its contents around the (0,0) origin. That means that everything ends up in negative PDF user space. The function documentation does reference that all objects are mirrored about the origin, but I had imagined that it would translate the objects accordingly to keep them on the page. What I ended up having to do to get the behavior I was after was pdf_page.transform(-1.0, 0.0, 0.0, 1.0, pdf_page.width().value, 0.0). This flips everything, and translates it back a full page width to bring it back into positive user space. If the current behavior of flip_horizontally and flip_vertically is as expected, then disregard!
  2. There seems to be a bug where rendering a PdfPages object that was transformed with transform() does not reflect the transformation. To workaround this, I had to recreate the PdfPagesobject by calling pages() again on the PdfDocument object. See the comment out line in the code block above. I'm guessing this is a PDFium bug and not a pdfium-render bug. Possibly it's related to this PDFium issue). If this is indeed a PDFium bug I'm not sure if this is something pdfium-render should provide a builtin workaround for, but flagging it just in case. I am using PDFium version 5854, prebuilt library downloaded from https://github.com/bblanchon/pdfium-binaries/releases/download/chromium/
ajrcarey commented 1 year ago

Hi @Lampyrida , thank you for reporting your two issues. I think it's fine to deal with both here in the same place.

Let's deal with your second one first, since it's easier. I agree that this appears to be an upstream bug in Pdfium and that the bug report you linked to at https://groups.google.com/g/pdfium-bugs/c/T2MwyYsAuUk seems a plausible candidate (although, if I'm reading that bug report right - and perhaps I'm not? - a patch was apparently submitted to fix it in 2019). It is quite straight-forward to work around this in pdfium-render by simply dropping the internal PDF_PAGE handle and reacquiring it after each page transformation operation. I have pushed a small change that does this. So, if you take pdfium-render as a git dependency, you should no longer need the commented line of code in your sample.

The first problem you identified is more problematic. I completely understand that it is not "expected" behaviour. However, it is technically the correct behaviour, and I'm reluctant to change it for your specific use case because that would have the potential to break other use cases.

What's bothering me more, though, is that the following code should achieve what you're after:

pdf_page.flip_horizontally()?;
pdf_page.translate(pdf_page.width(), PdfPoints::ZERO)?;

in a nice, expressive, self-describing manner. (This is just a more nicely written version of your pdf_page.transform() function call.) And, indeed, the bounding boxes of every page object are "correct" after the call to translate() (by which I mean, they exactly match the bounding boxes of every page object after your transform() call). But the page objects don't appear in the rendering of the page, and I have no idea why as yet. This seems very peculiar and I want to take some more time later this week to investigate this further.

Lampyrida commented 1 year ago

Thank you for looking into this @ajrcarey.

I should have mentioned actually that I did try the flip_horizontally() followed by translate(), and couldn't get that work. I mistakenly assumed that the translate() call overwrote the transformation matrix instead of adding to it. But your findings suggest there is something weirder going on.

ajrcarey commented 1 year ago

The problem turns out to have been pretty simple. Transformations on pages are applied using the FPDFPage_TransFormWithClip() Pdfium function. Like all Pdfium transformation functions, it takes a transformation matrix. Uniquely for pages, however, it also takes a clip rectangle to which the matrix should be supplied. That clip rectangle was limited to the visible area of the page, so objects moved out of the visible area by flipping around the origin were never within the clip rectangle.

I have pushed a small fix to this which increases the size of the clip rectangle from (0, 0, PdfPage::width(), PdfPage::height()) to (-PdfPage::width(), -PdfPage:height(), PdfPage::width(), PdfPage::height()). I do not think this is the optimal solution, but it is sufficient for your use case. If you take pdfium-render as a git dependency, you should find the following sample code now works as expected:

fn main() -> Result<(), PdfiumError> {
    let pdfium = Pdfium::new(Pdfium::bind_to_library(
        Pdfium::pdfium_platform_library_name_at_path("../pdfium/"),
    )?);

    let mut pdf_document = pdfium.load_pdf_from_file("../pdfium/test/export-test.pdf", None)?;
    let mut pdf_page = pdf_document.pages_mut().first()?;

    pdf_page.flip_horizontally()?;
    pdf_page.translate(pdf_page.width(), PdfPoints::ZERO)?;

    let width_in = pdf_page.width().to_inches();
    let width_px = width_in * 100.0; // 100 DPI
    let config = PdfRenderConfig::new().set_target_width(width_px.round() as Pixels);
    let image = pdf_page.render_with_config(&config)?.as_image();

    image
        .save("./output.png")
        .map_err(|_| PdfiumError::ImageError)?;

    pdf_document.save_to_file("./output.pdf")?;

    Ok(())
}
Lampyrida commented 1 year ago

Ah that makes a lot of sense. I just tried this out with the git dependency and it's working as expected. Thank you for the fix!

ajrcarey commented 1 year ago

Added new PdfPoints::MAX, PdfPoints::MIN, and PdfRect::MAX constants. PdfRect::MAX now contains the entire addressable page area. Use this as the default clipping region in PdfPage::transform() rather than PdfPage::size(). The fix will be released in crate version 0.8.7.