Webreaper / Damselfly

Damselfly is a server-based Photograph Management app. The goal of Damselfly is to index an extremely large collection of images, and allow easy search and retrieval of those images, using metadata such as the IPTC keyword tags, as well as the folder and file names. Damselfly includes support for object/face detection.
GNU General Public License v3.0
1.45k stars 76 forks source link

Improve performance of hashing and reduce memory #489

Closed vcsjones closed 1 year ago

vcsjones commented 1 year ago

πŸ‘‹ I had a look at the use of hashing here for detecting duplicate images. Currently, this will allocate an array of bytes per pixel-row of each image, which can a significant amount of memory usage.

Instead, we can rely on Span to remove the allocations all together. Benchmarking a 3024 x 4032 image (the size my iPhone currently takes, so seems representative):

Before

Method Mean Error StdDev Gen0 Allocated
Skia_GetHash 23.08 ms 0.169 ms 0.158 ms 7750.0000 46.61 MB
ImageSharp_GetHash 22.93 ms 0.160 ms 0.150 ms 7750.0000 46.61 MB

After

Method Mean Error StdDev Allocated
Skia_GetHash 20.19 ms 0.162 ms 0.152 ms 1.21 KB
ImageSharp_GetHash 20.01 ms 0.047 ms 0.042 ms 1.29 KB

This effectively makes the memory usage 1.2 kilobytes regardless of the image size, down from 46 megabytes.

A smaller optimization is to place the hash in a stack buffer before converting it to hex. That just saves one small array allocation for the hash itself. This uses Convert.ToHexString since it can natively operate on a ReadOnlySpan<byte> and also uses upper-case lettering, but if you prefer I can create an overload of your extension method that works off of span as well.

Additionally, this fixes a tiny issue where IncrementalHash is not being disposed. This does not leak memory, but results in finalizers getting run during garbage collection.

Webreaper commented 1 year ago

This is totally awesome - thank you! I'll take a look later and merge.

That is a seriously good memory optimisation. πŸ˜πŸ‘Œ