Closed SFM61319 closed 2 months ago
Yeah i was appalled by the execution time too. The issue lies in the clean_metadata() function which write_to_file() calls. It pounds the fs with syscalls byte by byte to find and remove the APP1 segment from JPEG.
I've refactored the JPEG part into in-memory mutation but since then wrote my own parser for PNG and JPEG for all the markers. I can submit the pull request if the owner is fine with refactor of the clean_metadata() for only one format.
@SFM61319 Thanks for your feedback - glad you like the API design! :) Regarding the performance issue: Speed was never one of the main objectives of little_exif. I’m aware that some functions are primitive and have a lot of room for improvement regarding performance. So far during development it was a trade-off between speed and being able to follow what is going on, how metadata is encoded, etc. This however does not mean that I am not open for suggestions/pull requests/etc. on how to improve the current situation! Speaking of pull requests:
@Xuf3r I'd be happy to incorporate your modifications to metadata cleaning!
@SFM61319 There is now a new version that includes the optimized version for metadata clearing from @Xuf3r. Would be great to hear from you whether this fixed the performance issue you experienced!
Nice! There is a major performance boost in the new update. I appreciate the quick response from your side, and a satisfying one at that! Thank you so much!
Here is the new flame chart:
As the title suggests, the
little_exif::metadata::Metadata::write_to_file
method is too slow and needs to be optimized.I am using
little_exif
as a dependency in a CLI app of mine. I love the API design and appreciate the good use of Rust's type system (especially for EXIF tags). However, there is a major drawback when it comes to the performance of the crate. Most (~99%) of the time in my core logic function (that involves writing a single EXIF tag to a JPEG image) is spent inlittle_exif::metadata::Metadata::write_to_file
. Before this, my core logic function also reads the same file usingstd::fs::read
and computes its hash, and yet that part of the logic seems to only take ~1% of the time.My program without
little_exif::metadata::Metadata::write_to_file
takes less than 100ms (even after including logging IO) for 10 JPEG images (~20MB each). But withlittle_exif::metadata::Metadata::write_to_file
it takes over 4s with the same data under the same conditions. That is 40x the original execution time. And this was just a sample dataset. The program's expected input specification states the JPEG image count to be>=60,000
, the size of each image being>=20MB
. This means a task supposed to be completed in under 10 minutes will take over a day, which is unacceptable to say the least.A flame graph of the
little_exif::metadata::Metadata::write_to_file
part of my core logic function:When I asked for help on the Rust Community Discord server, someone mentioned that
little_exif::metadata::Metadata::write_to_file
was likely reading the same file again and again, 4 times in total, therefore increasing the time as well as the memory consumption. I haven't gone throughlittle_exif
's source code, so I thought I should let you know about this possible bug source.