J-F-Liu / lopdf

A Rust library for PDF document manipulation.
MIT License
1.67k stars 176 forks source link

How to read operations correctly? #279

Closed Boltzmachine closed 6 months ago

Boltzmachine commented 6 months ago

A lot files on my computer cannot be read properly. I read the contents by

fn main() {
    use lopdf::Object;
    let doc = Document::load("file.pdf").expect("failed to load pdf");
    let pages = doc.get_pages();
    let page_id = pages.get(&1).unwrap();
    let content_streams = doc.get_page_contents(*page_id);
    for object_id in content_streams {
        if let Ok(content_stream) = doc.get_object(object_id).and_then(Object::as_stream) {
            let content = content_stream.decode_content().unwrap();
                for op in content.operations {
                    println!("{:?}", op);
            }
        }
    }
}

For example, for the file file1.pdf

It prints

Operation { operator: "x", operands: [] }
kusaanko commented 6 months ago

You need to decompress before decoding

if let Ok(content_stream) = doc.get_object(object_id).and_then(Object::as_stream_mut) {
    content_stream.decompress();
    let content = content_stream.decode_content().unwrap();
}