ajrcarey / pdfium-render

A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
https://crates.io/crates/pdfium-render
Other
364 stars 59 forks source link

Incorrect ImageFormat cause wrong render result. #119

Closed xVanTuring closed 11 months ago

xVanTuring commented 1 year ago

When I try to render some colored pdf, I found that the color was wrong, the red turns to blue. After some debug, I found that the PdfBitmap::as_bytes actually outputs a RGBA sequence, but the PdfBitmap::format say it's BGRA, then in the PdfBitmap::as_image it swaps red and blue. I'm not sure why this happens, but I can't reproduce it in WASM environment. \ The Pdf can be download from here in github.

output & expected

output expected

Code

use pdfium_render::prelude::*;

pub fn main() -> Result<(), PdfiumError> {
    let bindings = Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("./"))
        .or_else(|_| Pdfium::bind_to_system_library())?;

    let pdfidum = Pdfium::new(bindings);
    let document = pdfidum.load_pdf_from_file(
        "F:/archive/pdf/NET-Microservices-Architecture-for-Containerized-NET-Applications.pdf",
        None,
    )?;

    let page = document.pages().get(0).unwrap();
    let bitmap = page.render_with_config(
        &PdfRenderConfig::new()
            .set_target_width(500)
            .set_maximum_height(500),
    )?;
    println!("Bitmap Format {:?}", bitmap.format());
    bitmap
        .as_image()
        .as_rgba8()
        .ok_or(PdfiumError::ImageError)?
        .save_with_format(format!("page{}.jpeg", 0), image::ImageFormat::Jpeg)
        .map_err(|_| PdfiumError::ImageError)?;
    Ok(())
}

Env

xVanTuring commented 1 year ago

Update: If I manually set set_reverse_byte_order to false, the output will be correct. I noticed the note about do_set_flag_reverse_byte_order: true need to be true for image>=0.24, but i'm currently using image = "0.24.6".

&PdfRenderConfig::new()
    .set_target_width(width)
    .set_reverse_byte_order(false)
xVanTuring commented 1 year ago

Here is a pdf with some colored-text. colored.pdf

HeavenVolkoff commented 12 months ago

Just experienced this issue. Disabling set_reverse_byte_order fixed. I think the as_image function should check if the reverse_byte_order is enabled or not, and skip the BGR conversion if it is.

ajrcarey commented 12 months ago

Thanks for your patience. The fundamental problem appears to be that even when setting the set_reverse_byte_order() flag to true, Pdfium still indicates that the bitmap format is BGRA instead of RGBA. Technically it should be the latter because the byte order of the B and R channels has been reversed, but Pdfium appears to always report BGRA irrespective of the channel order.

I'm wondering if changing the default value of set_reverse_byte_order() to false is sufficient to correct the behaviour. In my testing, every colored PDF I rendered was affected by the problem. Do either of you have examples of PDFs that are not affected by this bug?

ajrcarey commented 12 months ago

As to why the problem does not affect WASM: if you are using the PdfBitmap::as_image_data() function in your WASM code, then the color swapping functionality used by PdfBitmap::as_image() is skipped entirely. This is a bug in its own right; whether the images are colored correctly or not, WASM code and non-WASM code should give the same result!

ajrcarey commented 12 months ago

Extracted pixel data color channel normalization functionality out of PdfBitmap::as_image() into new PdfBitmap::as_rgba_bytes() function. Deprecated PdfBitmap::as_bytes() in favour of renamed PdfBitmap::as_raw_bytes(). Updated WASM-specific PdfBitmap::as_image_data() function to consume output of PdfBitmap::as_rgba_bytes() rather than PdfBitmap::as_raw_bytes(). This ensures that color channel normalization behaviour is identical in both WASM and non-WASM builds.

As tempting as it would be to simply change the default of PdfRenderConfig::set_reverse_byte_order() from true to false, it does occur to me that this would have the side effect of altering the output of the WASM-specific PdfBitmap::as_array() function, requiring consumers to perform their own color channel normalization and thus defeating the purpose of the function (which is to avoid extra memory allocations). This means that it's likely necessary for each PdfBitmap to know if it was created from a rendering configuration where set_reverse_byte_order() was set to true or not.

ajrcarey commented 12 months ago

Added reverse byte order flag to PdfBitmap. Set flag from PdfPage::render_into_bitmap_with_settings(). Adjusted PdfBitmap::as_rgba_bytes() to take byte order flag into account when determining how to normalize pixel color channel data into RGBA.

The result is that, irrespective of the setting of set_reverse_byte_order(), color output of PdfBitmap::to_image() and PdfBitmap::to_image_data() is always RGBA and output is consistent between WASM and non-WASM builds. The only functions affected by the setting of the byte order flag are PdfBitmap::as_raw_bytes() and the WASM-specific PdfBitmap::as_array().

Updated README. Ready to release as part of crate version 0.8.16, pending additional testing.

ajrcarey commented 11 months ago

Released as part of crate version 0.8.16.