TechnikTobi / little_exif

A little library for reading and writing EXIF data in pure Rust.
Apache License 2.0
18 stars 4 forks source link

[Bug]: `little_exif::metadata::Metadata::write_to_file` is too slow and requires optimization #7

Closed SFM61319 closed 2 months ago

SFM61319 commented 2 months ago

As the title suggests, the little_exif::metadata::Metadata::write_to_file method is too slow and needs to be optimized.

I am using little_exif as a dependency in a CLI app of mine. I love the API design and appreciate the good use of Rust's type system (especially for EXIF tags). However, there is a major drawback when it comes to the performance of the crate. Most (~99%) of the time in my core logic function (that involves writing a single EXIF tag to a JPEG image) is spent in little_exif::metadata::Metadata::write_to_file. Before this, my core logic function also reads the same file using std::fs::read and computes its hash, and yet that part of the logic seems to only take ~1% of the time.

My program without little_exif::metadata::Metadata::write_to_file takes less than 100ms (even after including logging IO) for 10 JPEG images (~20MB each). But with little_exif::metadata::Metadata::write_to_file it takes over 4s with the same data under the same conditions. That is 40x the original execution time. And this was just a sample dataset. The program's expected input specification states the JPEG image count to be >=60,000, the size of each image being >=20MB. This means a task supposed to be completed in under 10 minutes will take over a day, which is unacceptable to say the least.

A flame graph of the little_exif::metadata::Metadata::write_to_file part of my core logic function:

Flame graph

When I asked for help on the Rust Community Discord server, someone mentioned that little_exif::metadata::Metadata::write_to_file was likely reading the same file again and again, 4 times in total, therefore increasing the time as well as the memory consumption. I haven't gone through little_exif's source code, so I thought I should let you know about this possible bug source.

Xuf3r commented 2 months ago

Yeah i was appalled by the execution time too. The issue lies in the clean_metadata() function which write_to_file() calls. It pounds the fs with syscalls byte by byte to find and remove the APP1 segment from JPEG.

I've refactored the JPEG part into in-memory mutation but since then wrote my own parser for PNG and JPEG for all the markers. I can submit the pull request if the owner is fine with refactor of the clean_metadata() for only one format.

TechnikTobi commented 2 months ago

@SFM61319 Thanks for your feedback - glad you like the API design! :) Regarding the performance issue: Speed was never one of the main objectives of little_exif. I’m aware that some functions are primitive and have a lot of room for improvement regarding performance. So far during development it was a trade-off between speed and being able to follow what is going on, how metadata is encoded, etc. This however does not mean that I am not open for suggestions/pull requests/etc. on how to improve the current situation! Speaking of pull requests:

@Xuf3r I'd be happy to incorporate your modifications to metadata cleaning!

TechnikTobi commented 2 months ago

@SFM61319 There is now a new version that includes the optimized version for metadata clearing from @Xuf3r. Would be great to hear from you whether this fixed the performance issue you experienced!

SFM61319 commented 2 months ago

Nice! There is a major performance boost in the new update. I appreciate the quick response from your side, and a satisfying one at that! Thank you so much!

Here is the new flame chart:

Flame chart