J-F-Liu / lopdf

A Rust library for PDF document manipulation.
MIT License
1.67k stars 176 forks source link

Improved JPEG Processing Efficiency by 10x #345

Closed lanyeeee closed 3 weeks ago

lanyeeee commented 3 weeks ago

Description:

This PR significantly improves the efficiency of PDF processing with JPEG images, achieving 10x performance by avoiding unnecessary JPEG decode when insert JPEG into PDF.

The image buffer is now directly used as binary data when inserting JPEG images into PDF, skipping the decode step altogether.

Reason for Change:

image-rs supports retrieving the dimensions and color type without decode. This eliminates the need to decode JPEGs just to obtain these properties.

In addition, JPEG can be embedded directly into PDF as binary data. This is possible by the DCTDecode filter. Therefore, decoding JPEGs is redundant.

Implementation Details:

Modified the image_from function to accept a binary buffer and a file path. The image_from now checks the format of the buffer, if it is JPEG, directly uses the buffer for PDF embedding without decode.

For non-JPEG formats, the image is still decoded to determine properties like dimensions and color type, but JPEG images bypass the decode step entirely.

These optimizations lead to a significant performance gain, especially in scenarios involving large numbers of JPEG images.