Arlodotexe / strix-music

Combine any music sources into a single library. It's your music. Play it your way.
http://www.strixmusic.com
138 stars 4 forks source link

SDK images should use Stream instead of Uri #185

Closed Arlodotexe closed 2 years ago

Arlodotexe commented 2 years ago

Background

In the Strix Music SDK, all image (interfaces inheriting from IImageBase) uses a Uri to provide images.

The problem

Unlike most of the data in the SDK, images are not raw data. Similar to an audio stream, it is a resource.

As part of the standard, with the exception of things that self-classify as an external resource (UriCollection), all data should be provided directly from the cores as data, instead of using Uris to point to the resource elsewhere.

There's a number of reasons for not using Uris:

The solution

Change IImageBase to use Stream instead of Uri.

Arlodotexe commented 2 years ago

If we're moving from Uri to Stream, we should also add a MimeType or ContentType property to help the client render in the correct format.

Arlodotexe commented 2 years ago

Merging images

We're currently using the Uri property to determine if an image should or shouldn't be merged with another image.

We need a solution to cover this once Uri is removed.


Generate a checksum

The easiest way to tell if 2 files are the same are to check them byte-per-byte. The calculation can be done ahead of time and a checksum can be created (MD5, SHA1, SHA-256, etc).

This is pretty close to how we avoid processing duplicate images in the SDK for scanned audio files.

However, with this new setup, the calculation must be done somewhere, and it isn't obvious where this should be.


Simple Heuristic image comparison

Since we know we're dealing with an image and have access to the raw bytes, some simple heuristic comparisons might be more than enough for now.


Perceptual hashing algorithm (image fingerprinting)

See https://www.hackerfactor.com/blog/?%2Farchives%2F432-Looks-Like-It.html

This technique is similar to acoustic fingerprinting, but for images.

Rather than generating hashes that are a byte-per-byte representation like MD5 or SHA1, this technique aims to generate hashes that approximately represent what the image looks like.

By reducing size, reducing color, averaging the remaining colors, etc., you can compute a hash from the remaining bytes that is very close to what similar images would have generated.

From there, you can count how many bits don't match (a Hamming distance) and use that to determine how similar the image is.

Example

Pulling from our sources, let's create an image fingerprint for an image. This will be a minimum example to help you (and me) to understand the process, there are significant improvements you can make if you keep researching.

Original image:

Reduce the image size

Here's the same image, but 8x8 (64 pixels) image

Blown up to the original size again:

Reduce color

Average the colors

Compute the bits Using the average from before, go through each bytes and do the following:

The result is a new image containing a rough outline of ONLY the most prominent features in the original: image

Blown up to the original size again:

Lastly, create the hash.

This is more than doable from scratch with just ImageSharp, but there are improved versions that would serve us much better if it's needed.