ajrcarey / pdfium-render

A high-level idiomatic Rust wrapper around Pdfium, the C++ PDF library used by the Google Chromium project.
https://crates.io/crates/pdfium-render
Other
341 stars 52 forks source link

`PdfDocument::bookmarks::iter` skips the root bookmark #120

Closed xVanTuring closed 10 months ago

xVanTuring commented 11 months ago

The doc says it starting from the top-level root bookmark. I assume that means including the root(first) bookmark.

Code

use pdfium_render::prelude::*;

pub fn main() -> Result<(), PdfiumError> {
    let bindings = Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("./"))
        .or_else(|_| Pdfium::bind_to_system_library())?;

    let pdfidum = Pdfium::new(bindings);
    let document: PdfDocument<'_> = pdfidum.load_pdf_from_file(
        "F:/archive/pdf/NET-Microservices-Architecture-for-Containerized-NET-Applications.pdf",
        None,
    )?;

    let bookmarks = document.bookmarks();
    println!("root: {}", bookmarks.root().unwrap().title().unwrap());
    println!("Iter:");
    for (idx, bookmark) in bookmarks.iter().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    Ok(())
}

Output

root: Introduction to Containers and Docker
Iter: ## skipped the root bookmark
0: Choosing Between .NET and .NET Framework for Docker Containers
1: Architecting container and microservice-based applications
2: Development process for Docker-based applications
3: Designing and Developing Multi-Container and Microservice-Based .NET Applications
4: Tackle Business Complexity in a Microservice with DDD and CQRS Patterns
5: Implement resilient applications
6: Make secure .NET Microservices and Web Applications
7: .NET Microservices Architecture key takeaways
xVanTuring commented 11 months ago

Also iter_all_descendants seems not working like the description(It should iterator all node and those child)

Code

println!("root: {}", root.title().unwrap());
for (idx, bookmark) in root.iter_all_descendants().enumerate() {
    println!("    {idx}: {}", bookmark.title().unwrap());
}

Output

root: Introduction to Containers and Docker
    0: What is Docker?
    1: Docker terminology
    2: Docker containers, images, and registries

But 0: What is Docker have some sub-bookmarks.

bookmark

ajrcarey commented 11 months ago

Hi @xVanTuring , thank you for reporting the issue. Let's focus on this issue first, since you have a work-around for your other issue. Are you able to provide a non-copyrighted sample document that demonstrates the problem?

xVanTuring commented 11 months ago

Hi @xVanTuring , thank you for reporting the issue. Let's focus on this issue first, since you have a work-around for your other issue. Are you able to provide a non-copyrighted sample document that demonstrates the problem?

Bookmark.pdf Here is a simple pdf I made contains only some bookmarks.

ajrcarey commented 10 months ago

I agree, the traversal methodology used by the PdfBookmarksIterator is rather peculiar and it gives unexpected results. I have rewritten the iterator to use a standard depth-first graph traversal technique. Using a slightly adjusted version of your sample code:

use pdfium_render::prelude::*;

pub fn main() -> Result<(), PdfiumError> {
    let bindings =
        Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("../pdfium/"))
            .or_else(|_| Pdfium::bind_to_system_library())?;

    let pdfidum = Pdfium::new(bindings);
    let document = pdfidum.load_pdf_from_file("Bookmark.pdf", None)?;

    let bookmarks = document.bookmarks();
    println!("root: {}", bookmarks.root().unwrap().title().unwrap());
    println!("Iter root direct children:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_direct_children().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter root all descendants:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_all_descendants().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter entire tree from root:");
    for (idx, bookmark) in bookmarks.iter().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    Ok(())
}

and applying it to your sample document, I now get the following output:

root: Chapter 1
Iter root direct children:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.3
Iter root all descendants:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
Iter entire tree from root:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
8: Chapter 2
9: 2.1
10: 2.2
11: 2.2.1
12: 2.2.2
13: 2.2.2.1
14: 2.2.2.2
15: 2.3
16: 2.3.1
17: 2.3.2

which looks more like the expected result.

ajrcarey commented 10 months ago

Extended sample code to check siblings as well:

use pdfium_render::prelude::*;

pub fn main() -> Result<(), PdfiumError> {
    let bindings =
        Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("../pdfium/"))
            .or_else(|_| Pdfium::bind_to_system_library())?;

    let pdfidum = Pdfium::new(bindings);
    let document = pdfidum.load_pdf_from_file("Bookmark.pdf", None)?;

    let bookmarks = document.bookmarks();
    println!("root: {}", bookmarks.root().unwrap().title().unwrap());
    println!("Iter root siblings:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_siblings().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter root direct children:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_direct_children().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter root all descendants:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_all_descendants().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter entire tree from root:");
    for (idx, bookmark) in bookmarks.iter().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    Ok(())
}

Made a small change to PdfBookmarksIterator to ensure a skip sibling is never yielded as part of iteration. This avoids a bookmark being included in its own list of siblings. The sample code output is now:

root: Chapter 1
Iter root siblings:
0: Chapter 2
Iter root direct children:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.3
Iter root all descendants:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
Iter entire tree from root:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
8: Chapter 2
9: 2.1
10: 2.2
11: 2.2.1
12: 2.2.2
13: 2.2.2.1
14: 2.2.2.2
15: 2.3
16: 2.3.1
17: 2.3.2

Updated README. Ready to release as part of 0.8.16.