ajrcarey / pdfium-render

A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
https://crates.io/crates/pdfium-render
Other
363 stars 59 forks source link

Segfault after rendering bitmaps #107

Closed frnsys closed 1 year ago

frnsys commented 1 year ago

Hi, thanks for this library. I'm encountering a problem where I get a segfault when the PDF document or pages are dropped:

fn extract_annotations_images(path: &Path) -> Result<(), PdfiumError> {
    let pdfium = Pdfium::new(
        Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path(
            "./pdfium/lib/",
        ))
        .or_else(|_| Pdfium::bind_to_system_library())?,
    );

    let mut document = pdfium.load_pdf_from_file(path, None)?;

    for (page_num, mut page) in document.pages_mut().iter().enumerate() {
        for i in 0..page.annotations().len() {
            let annotation = page.annotations().get(i).unwrap();
            if let PdfPageAnnotationType::Square = annotation.annotation_type() {
                let bounds = annotation.bounds().unwrap();
                let conf = PdfRenderConfig::new()
                    .render_annotations(false)
                    .scale_page_by_factor(3.);
                let orig_crop = page.boundaries().crop().unwrap().bounds;
                page.boundaries_mut().set_crop(bounds).unwrap();
                {
                    // If I comment these two lines out, no segfault.
                    let bmap = page.render_with_config(&conf).unwrap();
                    bmap.as_image()
                        .save_with_format("/tmp/foo.png", image::ImageFormat::Png)
                        .unwrap();
                }
                page.boundaries_mut().set_crop(orig_crop).unwrap();
            }
        }
    }

    Ok(())

    // Segfault here
}

I've tried with both the master branch and version 8.10.0. pdfium is version 118.0.5989.0.

As I was preparing this example I noticed something strange.

The original copy of the PDF I have (test.pdf in the attachments) segfaults with the bitmap lines, but a copy I made using pdftk test.pdf cat 0-10 output test_copy.pdf doesn't segfault. If I just create a direct copy with pdftk test.pdf cat 0-10 output test_copy_2.pdf that copy does segfault.

Any idea what could be going wrong?

pdfs.zip

frnsys commented 1 year ago

I played around with this some more and I can avoid the segfault by changing the config to:

                let conf = PdfRenderConfig::new()
                    .render_form_data(false) // Added this
                    .render_annotations(false)
                    .scale_page_by_factor(3.);

Comparing the output of pdfinfo for each file:

Pdf that segfaults:

Custom Metadata: yes
Metadata Stream: yes
Tagged:          no
UserProperties:  no
Suspects:        no
Form:            AcroForm
JavaScript:      no
Pages:           10
Encrypted:       no
Page size:       595.276 x 793.701 pts
Page rot:        0
File size:       3060469 bytes
Optimized:       no
PDF version:     1.7

Pdf that doesn't segfault:

Custom Metadata: no
Metadata Stream: no
Tagged:          no
UserProperties:  no
Suspects:        no
Form:            none
JavaScript:      no
Pages:           10
Encrypted:       no
Page size:       595.276 x 793.701 pts
Page rot:        0
File size:       3031783 bytes
Optimized:       no
PDF version:     1.7

So I guess it has something to do with AcroForm?

ajrcarey commented 1 year ago

Hi @frnsys , thank you for reporting the issue.

I cannot reproduce the problem on Arch Linux with Pdfium build 118.0.5989.0 sourced from https://github.com/bblanchon/pdfium-binaries/releases/tag/chromium%2F5989. Tell me about your operating system and runtime environment.

I used the following test code, which I believe is almost identical to what you originally provided. All I did was add a main() and some logging statements.

use pdfium_render::prelude::*;

fn main() -> Result<(), PdfiumError> {
    extract_annotations_images("./test.pdf")?;
    extract_annotations_images("./test_2.pdf")?;
    extract_annotations_images("./test_copy.pdf")?;

    Ok(())
}

fn extract_annotations_images(path: &str) -> Result<(), PdfiumError> {
    let pdfium = Pdfium::new(
        Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("../pdfium/"))
            .or_else(|_| Pdfium::bind_to_system_library())?,
    );

    let mut document = pdfium.load_pdf_from_file(path, None)?;

    for (page_num, mut page) in document.pages_mut().iter().enumerate() {
        println!("{}, page {}", path, page_num);

        for i in 0..page.annotations().len() {
            let annotation = page.annotations().get(i).unwrap();
            if let PdfPageAnnotationType::Square = annotation.annotation_type() {
                println!("   Annotation {}", i);

                let bounds = annotation.bounds().unwrap();
                let conf = PdfRenderConfig::new()
                    .render_annotations(false)
                    .scale_page_by_factor(3.);
                let orig_crop = page.boundaries().crop().unwrap().bounds;
                page.boundaries_mut().set_crop(bounds).unwrap();
                {
                    // If I comment these two lines out, no segfault.
                    let bmap = page.render_with_config(&conf).unwrap();
                    bmap.as_image()
                        .save_with_format(
                            format!("./foo-{}-{}.png", page_num, i),
                            image::ImageFormat::Png,
                        )
                        .unwrap();
                }
                page.boundaries_mut().set_crop(orig_crop).unwrap();
            }
        }
    }

    Ok(())

    // Segfault here
}
ajrcarey commented 1 year ago

(PS I think from memory you can set a render clip as part of your render config, so you don't have to apply a crop boundary to each page. Using render config may be more convenient.)

frnsys commented 1 year ago

Thanks for the tip!

As for the other details, I'm using Ubuntu 22.04, kernel 5.15.0-83-generic. What other details would be relevant?

ajrcarey commented 1 year ago

Well, I'm grasping at straws here, but what version of rustc are you using to compile?

Not sure off the top of my head why Arch and Ubuntu would have different behaviour. Your Ubuntu machine isn't virtualised in any way is it?

Can I absolutely, 100% confirm that you can still reproduce the problem using my sample code above?

frnsys commented 1 year ago

Bizarre, this is my rust info:

stable-x86_64-unknown-linux-gnu (default)
rustc 1.72.0 (5680fa18f 2023-08-23)

I still get the segfault with your code, but at least I have something that works on my end now. I'm guessing it's just some quirk with my setup so happy to close this if you want.

thanks for your help!

ajrcarey commented 1 year ago

Cannot reproduce the problem on a fresh virtual install of Ubuntu 22.04 with rust 1.72.0. Admittedly a fresh install of Ubuntu includes kernel version 6.2.0-32 rather than 5.15.0-83, but I wouldn't have thought it should make a difference anyway.

Unless you have any additional suggestions for how I can reproduce the problem, I think I'm going to close this.

frnsys commented 1 year ago

Strange, I'll keep investigating. But thank you for looking into it.