fschutt / printpdf

A fully-featured PDF library for Rust, WASM-ready
https://fschutt.github.io/printpdf/
MIT License
829 stars 98 forks source link

PDF file size is 3.5x bigger than its contents #174

Closed vklachkov closed 8 months ago

vklachkov commented 8 months ago

I'm using your library to convert images to PDF. And I noticed that the PDF is many times larger than the original image:

> ls -hal .
-rw-r--r-- 1 valera valera 186K Mar 15 18:23 Input.jpg
-rw-r--r-- 1 valera valera 665K Mar 15 18:47 Output.pdf
Code Usage: `cargo run --release -- ./Input.jpg ./Output.pdf` ```rust use printpdf::{ image_crate::{codecs::jpeg::JpegDecoder, ImageDecoder}, *, }; use std::{ env::args, fs, io::{self, Read}, }; fn main() { let input = args().nth(1).unwrap(); let output = args().nth(2).unwrap(); // Get jpeg info. let decoder = JpegDecoder::new(fs::File::open(&input).unwrap()).unwrap(); let (width, height) = decoder.dimensions(); let (width, height) = (Px(width as _), Px(height as _)); let (color_space, bits_per_component) = match decoder.color_type() { image_crate::ColorType::L8 => (ColorSpace::Greyscale, ColorBits::Bit8), image_crate::ColorType::La8 => (ColorSpace::GreyscaleAlpha, ColorBits::Bit8), image_crate::ColorType::Rgb8 => (ColorSpace::Rgb, ColorBits::Bit8), image_crate::ColorType::Rgba8 => (ColorSpace::Rgba, ColorBits::Bit8), image_crate::ColorType::L16 => (ColorSpace::Greyscale, ColorBits::Bit16), image_crate::ColorType::La16 => (ColorSpace::GreyscaleAlpha, ColorBits::Bit16), image_crate::ColorType::Rgb16 => (ColorSpace::Rgb, ColorBits::Bit16), image_crate::ColorType::Rgba16 => (ColorSpace::Rgba, ColorBits::Bit16), _ => panic!("Unsupported jpeg color type {:?}", decoder.color_type()), }; // Read bytes from input. let mut image_data = Vec::new(); fs::File::open(input) .unwrap() .read_to_end(&mut image_data) .unwrap(); // Create document from image. let doc = PdfDocument::empty("Document"); let dpi = 300.0; // TODO: What is DPI? let (page, layer) = doc.add_page( Mm::from(width.into_pt(dpi)), Mm::from(height.into_pt(dpi)), "Image Layer", ); Image::from(ImageXObject { width, height, color_space, bits_per_component, interpolate: false, image_data, image_filter: Some(ImageFilter::DCT), smask: None, clipping_bbox: None, }) .add_to_layer( doc.get_page(page).get_layer(layer), ImageTransform::default(), ); // Save document to output. let file = fs::OpenOptions::new() .create(true) .write(true) .truncate(true) .open(output) .unwrap(); let mut writer = io::BufWriter::with_capacity(128 * 1024, file); doc.save(&mut writer).unwrap(); } ```

Files:

Input Output

After comparing input and output, I discovered that the library adds two FlateDecode streams before the source image. For what? How can I remove them? Or am I somehow converting images to PDF incorrectly?

fschutt commented 8 months ago

Disable the ICC color profile, that is embedded by default: https://github.com/fschutt/printpdf/blob/master/examples/no_icc.rs#L17-L21

When I wrote this library, my goal was to make the PDF conform to PDF/X standards and one of these standards is to include a valid color profile.

vklachkov commented 8 months ago

@fschutt Thank you very much for your help and for the library!