J-F-Liu / lopdf

A Rust library for PDF document manipulation.
MIT License
1.55k stars 155 forks source link

Wrong letters in pdf #250

Open ThomasCartier opened 8 months ago

ThomasCartier commented 8 months ago

Hi,

The following code reports wrong letters (they are added by 4 for some reasons)

The culprit: https://www.dropbox.com/scl/fi/6a8zuy70s05pntvxm0vae/test.pdf?rlkey=ylju1wbavr8rff10jp621u6bo&dl=0

It reports DEF instead of ABC.

    #[cfg(any(feature = "pom_parser", feature = "nom_parser"))] // same result with "pom"

    let doc_res = Document::load("/path/to/test.pdf");

    let mut doc = match doc_res {
        Ok(v) => v,
        Err(_) => return,
    };

    doc.decompress();
    let mut page_id: u32 = 0;
    for x in doc.get_pages().iter() {

        let t = doc.extract_text(&[*x.0]);
        match t {
            Ok(b) => {
                println!("{}", b);
            }
            Err(e) => println!("Nope {}", e),
        }
    }

    return;

Any idea why ? anything wrong with my code ? Thanks

Angr1st commented 2 weeks ago

Your PDF example file contains DEF text content but show ABC when opened in a PDF viewer. This seems to be done by using rg/RG operators to manually draw ABC and avoiding to use Text streams or something like it (I am no PDF expert).