Closed stephenjudkins closed 1 year ago
Brilliant, thank you so much for plugging this gap! Do you have a sample PDF file containing a grey-scale image?
Here's one! gray.pdf
Great, thank you. But when I look at the image format of the image on that page, Pdfium tells me it's BGRA, not grayscale, and rendering it doesn't exercise your code path (in fact, I can delete your code entirely and the image renders perfectly fine).
Can you provide an example for which pdfium-render
previously generated an error during image page object rendering, thus necessitating your code change?
Confirmed that this (very large!) image does exercise the problem. Sorry, resizing it down to a smaller image converted the colorspace.
Great, thank you. When I open the file in PdfExplorer it does seem to confirm that the image colorspace is DeviceGray
, but again Pdfium identifies it (rightly or wrongly) as BGRA and rendering the image object to an image doesn't exercise your code path. Perhaps Pdfium is doing something clever in the background that is obfuscating things.
Are you able to share the document you were working with that caused you to initially discover the missing PdfBitmapFormat::Gray
handler in PdfPageImageObject::get_image_from_bitmap_handle()
?
Here's a reduced example of the code I'm using to exercise this:
use pdfium_render::prelude::*;
fn go() -> Result<()> {
let pdfium = Pdfium::new(
Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("./"))
.or_else(|_| Pdfium::bind_to_system_library())?,
);
let doc = pdfium.load_pdf_from_file("big.pdf", None)?;
for page in doc.pages().iter() {
for object in page.objects().iter() {
if let Some(image) = object.as_image_object() {
match image.get_raw_image() {
Ok(i) => println!("{} x {}", i.width(), i.height()),
Err(e) => println!("{:?}", e)
};
}
}
}
Ok(())
}
fn main() {
go().unwrap();
}
When I run this with my branch of pdfium-render
:
stephen@boris-godunov image-handler % cargo run --release
Finished release [optimized] target(s) in 0.09s
Running `target/release/image_handler`
8104 x 10140
When I run with the latest cargo release (0.7.29
) of pdfium-render
:
stephen@boris-godunov image-handler % cargo run --release
Finished release [optimized] target(s) in 0.09s
Running `target/release/image_handler`
ImageError
I've gone and added some println!
debugging to my local pdfium-render
source code and verified that, if I remove the new match, we are hitting the codepath where we match on PdfBitmapFormat::Gray
.
Perhaps there is a different version of pdfium
we're using?
stephen@boris-godunov image-handler % shasum -a 256 libpdfium.dylib
5b28effbe31b7327e3e6485acc1d999cccf52c21815154f0a50779221daac3c3 libpdfium.dylib
I got the prebuilt library from https://github.com/bblanchon/pdfium-binaries/releases/tag/chromium%2F5579. Let me try the more recent version and see what happens....
I tried the latest release from that repo (https://github.com/bblanchon/pdfium-binaries/releases/tag/chromium%2F5619) and I'm still seeing the ImageError
(also, if not's clear, I'm on macOS/arm64)
Ah, I see the problem, and it's totally PEBKAC on my part. I was using PdfPageImageObject::get_processed_image()
rather than PdfPageImageObject::get_raw_image()
in my test code. It makes perfect sense that get_processed_image()
would, y'know, process the color space :)
With get_raw_image()
, I can indeed reproduce your original problem and I can confirm your code change resolves it. Many thanks again for plugging this gap. Your fix will be released as part of crate version 0.7.32 shortly.
Great! Thank you so much for your work here.
I've confirmed this works with a PDF that I have but could add a test case if you'd like!