Arlodotexe commented 2 years ago

Background

In the Strix Music SDK, all image (interfaces inheriting from IImageBase) uses a Uri to provide images.

The problem

Unlike most of the data in the SDK, images are not raw data. Similar to an audio stream, it is a resource.

As part of the standard, with the exception of things that self-classify as an external resource (UriCollection), all data should be provided directly from the cores as data, instead of using Uris to point to the resource elsewhere.

There's a number of reasons for not using Uris:

Allows plugins to interact with the data
- Resource optimization
- Local caching
Uris might be behind a nonstandard protocol or inaccessible address, which would block access to that data.
Streams allow you to open a direct stream to the data, meaning
- We can move all image resize processing from the file scanner to a plugin.
- It works even if that data is fabricated in memory and wouldn't have been accessible from a Uri.

The solution

Change IImageBase to use Stream instead of Uri.

Remove the Uri property
Add OpenStreamAsync method that returns a Task<Stream>
Ensure all of the following get the new implementation (recommend using ReSharper's "Apply Refactoring" here)
- [ ] BaseModels (start here)
- [ ] CoreModels
- [ ] AppModels
- [ ] AdapterModels
- [ ] PluginModels
- [ ] ViewModels
- [ ] Model plugins
- [ ] Mock models
- [ ] Unit tests

Arlodotexe commented 2 years ago

If we're moving from Uri to Stream, we should also add a MimeType or ContentType property to help the client render in the correct format.

Arlodotexe commented 2 years ago

Merging images

We're currently using the Uri property to determine if an image should or shouldn't be merged with another image.

We need a solution to cover this once Uri is removed.

Generate a checksum

The easiest way to tell if 2 files are the same are to check them byte-per-byte. The calculation can be done ahead of time and a checksum can be created (MD5, SHA1, SHA-256, etc).

This is pretty close to how we avoid processing duplicate images in the SDK for scanned audio files.

However, with this new setup, the calculation must be done somewhere, and it isn't obvious where this should be.

If implemented in CoreModels
- [con] All cores need to agree on the same hashing algorithm
- [con] Cores might choose not to implement this because of the extra effort.
- [pro] Existing hashes that come from server-side or a file system can be reused, calculation can potentially be sidestepped entirely
- [pro] If not provided by a core, we can do the calculation for them as it's needed when merging.
If implemented in AppModels
- [con] No option for lazy calculations, potentially more resource intensive
- [con] To stay accurate, checksum must be generated at runtime, every time. There's no way to associate an IImage with a known checksum.
- [pro] Can be done on the fly during the collection merging process
- [pro] The SDK decides on the hashing algorithm
- [pro] Cores don't need to implement anything, just give us a data stream.
- [pro] Opens the door to very optimized image comparison, rather than comparing all the bytes.

Simple Heuristic image comparison

Since we know we're dealing with an image and have access to the raw bytes, some simple heuristic comparisons might be more than enough for now.

Are the streams the same length? If not, they don't match.
Checking the metadata, are the images the same resolution? If not, they don't match.
Ignoring metadata, check the actual image bytes. If they stop matching before reaching the end of the stream, the images don't match.
We can skip X number of bytes and make this number user-configurable - a speed/accuracy dial, of sorts.

Perceptual hashing algorithm (image fingerprinting)

See https://www.hackerfactor.com/blog/?%2Farchives%2F432-Looks-Like-It.html

This technique is similar to acoustic fingerprinting, but for images.

Rather than generating hashes that are a byte-per-byte representation like MD5 or SHA1, this technique aims to generate hashes that approximately represent what the image looks like.

By reducing size, reducing color, averaging the remaining colors, etc., you can compute a hash from the remaining bytes that is very close to what similar images would have generated.

From there, you can count how many bits don't match (a Hamming distance) and use that to determine how similar the image is.

Example

Pulling from our sources, let's create an image fingerprint for an image. This will be a minimum example to help you (and me) to understand the process, there are significant improvements you can make if you keep researching.

Original image:

Reduce the image size

The smaller this image, the more details it removes, the closer it can match other images
Makes the original image size irrelevant
Less pixels means less color complexity

Here's the same image, but 8x8 (64 pixels)

Blown up to the original size again:

Reduce color

Convert to grayscale.
Each pixel color is now somewhere between black and white.
We now have 64 total colors in the image.

Average the colors

Given the 64 colors we have now, find the average (mean) value of the 64 bytes in the image and record it somewhere.

Compute the bits Using the average from before, go through each bytes and do the following:

If the byte is above the mean, write a 1 (black)
If the bytes is below the mean, write a 0 (white)

The result is a new image containing a rough outline of ONLY the most prominent features in the original:

Blown up to the original size again:

Lastly, create the hash.

The image created in the previous step is an 8x8 image comprising of simple 1's and 0's, meaning it's a total of 64 bits.
This can be converted to a 64-bit integer: 10319777633083913424
Then, we can convert from base 10 to base 16 for a hex value: 8f373714acfcf4d0. This is our hash.
Run this again with a different image and compute the Hamming distance of the hash to see how similar the two images are!

This is more than doable from scratch with just ImageSharp, but there are improved versions that would serve us much better if it's needed.

Arlodotexe / strix-music

SDK images should use Stream instead of Uri #185

Background

The problem

The solution

Merging images

Generate a checksum

Simple Heuristic image comparison

Perceptual hashing algorithm (image fingerprinting)

Example