ChrisRega / image-compare

image comparison in rust
MIT License
39 stars 3 forks source link

Question: What is the best way to compare animated gifs? #1

Open cole21771 opened 1 year ago

cole21771 commented 1 year ago

From what I can tell while using this crate, the comparison functions seem to just use the first frame of an animated gif for comparison, not all frames.

To get around this, I was considering trying to compare frame by frame manually or by stacking them into a "sprite sheet" of each frame.

I feel neither of these are particularly good solutions, but I'm not really certain what the best way to go about this problem is. Do you have any suggestions?

ChrisRega commented 1 year ago

Hi! sorry for my delayed answer and happy new year. So regarding your question I have several remarks or suggestions.

What is the goal :)

First of all, you should think about what you want to measure, do you want some strictly mathematical comparison or do rather want to model the perceived difference by a human. The second question we have to clarify is if your input gifs are always of the same temporal resolution (the same frames per second).

How similar is your input wrt temporal resolution

Since it only makes sense to compare frames at the same point in time, it might require you to interpolate the higher resolution one to the exact times of the lower resolution one. E.g. Gif1 has 33fps, Gif2 has 50. It makes sense to use every frame of the 33 fps video and interpolate between the 50fps version frames since interpolating between closer frames reduces the temporal interpolation error.

The comparison

One dimensional approach

Assuming we now have two image stacks with identical temporal resolution and we just want "some diff", we are probably fine using the similarity of the hybrid compare for each frame. The score can then be either just averaged (no need to absolute them here) over all frames or root-mean-squared (sqrt(sum(score*score)). Both are valid choices, see any RMS vs Average Absolute Deviation discussion. I would personally opt for RMS because it respects the gravity of extreme outliers not just their existence.

Multi dimensional approach

While the above will yield some similarity, it will not be very sensitive to anything close to perceived difference. For perception, the problem becomes multidimensional. 1.) How different are the single frames of the two images \del_x(i) = ||A[i] + B[i]|| / 2 (lets call the image-dimension x) 2.) How temporally stable are the frames inside each image \del_t(A, i (should be t)) = ||A[i] + A[i+1] +A[i-1]|| / 3 (and the temporal dimension t) 3.) How temporally stable are the differences between the frames \del_xt(i (should be t)) = (\del_x(i) + \del_x(i+1) + \del_x(i-1)) / 3

The first point was discussed already, but for the second dimension imagine the following: Both imagestacks are stable e.g. each frame is identical. Any difference will be more easy to spot for a human then say an animated explosion with a single wrong frame. I would suggest taking the hybrid-similarity between temporally adjacent frames into the model, maybe as a floating average over something distinguishable by humans e.g. 1/20s or so. In the equation above i just used 3 to clarify the idea. If the difference is higher just down-weight any difference between the two frames of the two images.

The third point is stability of the difference. If a single frame of a temporally correlated image stack is completely different it might not be noticed if short enough. The longer difference persists, the stronger the perception of the difference will be. As an example: Think of a 100 fps image with 4 frames per second being totally different but nearly equidistant in time: frame 2, 33, 66, 99. It will be harder to spot these than 4 adjacent different images because the brain just gets more time to process the differences.

That's it!

So this would be my first shot for modeling something like a perception-like difference metric between temporally equidistant image stacks. I could definitely think of implementing something like the second approach if there's interest for it :)

@cole21771 hope this helps a bit !

Kind regards!

cole21771 commented 1 year ago

Wow, this was a significantly more detailed of a response than I was expecting! You've definitely given me a lot to think about in how I want to approach this problem.

I'll give you some more context on what my goal is here. I have a website that I maintain for fun which hosts ~8124 custom emojis from various slack servers and discords I'm in to make them easily accessible to others. https://abbedu.to/emojis if you're curious.

For a while I've been working on a admin feature to be able to bulk upload a large list of emojis from new slack servers I join. For any new emojis that have the same name as an existing emoji, I want to compare the new one to the existing one to see how similar they are.

So I'm looking for human perceived difference in images.

Obviously I quickly ran into the issue of animated gifs as I stated in my initial issue. If you're serious about your offer to work on a gif comparison function, I would greatly appreciate it! But in the meantime I could definitely create something to work at a basic level given your feedback/suggestions.

Another issue I started running into is that rgb_hybrid_compare takes an Rgb8 image. Specifically the fact that it doesn't support transparency. When using DynamicImage::into_rgb8(), the transparency is dropped and any parts of the image that were transparent become whatever the pixel values were set there, which most of the time seems to be RGB(0, 0, 0) (black). This is problematic for my use case as images like thank-you-simple.png or amd.png become pitch black and then look 100% similar according to the algorithm. Is there any specific reason that you don't support Rgba8 images?

In case it wasn't clear by now, I don't understand the math behind image comparison in the slightest. BUT I would be happy to learn if you have any resources you'd suggest on how to understand this stuff because a lot of the details you had in the math went straight over my head. I really appreciate you responding at all let alone with so much detail. Thank you so much!

ChrisRega commented 1 year ago

Heyho!

Thanks, nice to hear it helped you. Since it's pretty late here (for a dad who needs to work tomorrow 😂) I would just quickly sheet answer the alpha question: it's not trivial to reason how alpha should be taken into account wrt perception. transparency is complex. I could think of at least three ways to take it into account but we would need data to check whether their respective results make sense. Initial 💡: 1.) Mssim over alpha and just add to hybrid rating, differences in transparency are perceivable after all, probably similar to grayscale structure. 2.) Weight hybrid score with alpha, the more transparent the lesser the visible difference. 3.) Both.

If you are interested I can implement that on a branch and you try to find / make some test data for it to evaluate which works best. Seems like you have a lot of similar images on your server.

Regards and good night 🌃 Chris

ChrisRega commented 1 year ago

@cole21771 I am currently working on an RGBA Version of the hybrid compare. Would you mind testing it? You can find it on the branch "0.3.0-dev" and I provided the same API as known from the 0.2.x series. I would like to revamp the API of the crate for 0.3.0 so please don't take this API for stable.

I would love some feedback on this from you, especially if the numerical results are good enough for you to solve your problem.

Regards Chris

cole21771 commented 1 year ago

Hey @ChrisRega, I really appreciate you looking into this for me! I kept forgetting to get back to you as I've been busy these past few weeks with other things. I had a quick chance to look into it to see how well it works and it seems to be giving some majorly inflated scores for images that the previous algorithm didn't think were nearly as similar.

I have a bunch more interesting image comparisons if you need more examples of interesting findings. So please don't hesitate to ask if there's something I can help you with. Here's a few I took the time to investigate:

Examples

Notes

Tesla Logo

https://emoji.slack-edge.com/T8U2SMNKG/tesla/832105356ddd0c58.png https://d3kykrewn4qzx5.cloudfront.net/tesla.jpg

These images are obviously similar to a human, but I would not want the algorithm to think they are the same due to the color difference, the transparency differences, and the added text at the bottom. I find it really strange that even my old processed score thought they were 46% similar.

New Score: 0.4181016683578491 Old Score: 0.008480727672576904 Old Processed Score: 0.4677591323852539

Goose

https://emoji.slack-edge.com/T8U2SMNKG/goose/9fbeb19ac2225b54.png https://d3kykrewn4qzx5.cloudfront.net/goose.jpg

Once again, these are both obviously pictures of a goose. But one is from a game and the other is a real life goose and they look nothing alike. I have no clue why the new score thinks they're so similar, I would love to hear your theory on that since I don't understand the math at all.

New Score: 0.3890223503112793 Old Score: 0.016255812719464302 Old Processed Score: 0.06096023693680763

200

https://emoji.slack-edge.com/T8U2SMNKG/200/e5388944a0cc701f.png https://d3kykrewn4qzx5.cloudfront.net/200.png

Similar images of 200. The second one obviously mocking the first one. I'm impressed that it finds them 44% similar and I would somewhat agree with that score if you ignore the lack of transparency on the second one. But unfortunately, that's where the big concern lies. The white background on the second one I would expect would make them seem significantly more dissimilar.

New Score: 0.44058308005332947 Old Score: 0.1358027458190918 Old Processed Score: 0.43305855989456177

ChrisRega commented 1 year ago

@cole21771 that's super interesting, thanks for the data. Could you provide the final input data to the algorithm, just so I can reproduce the numbers here. Especially interesting would be the one where the difference is so high.

Thanks in advance for your help and effort Chris

cole21771 commented 1 year ago

Sure, I'd love to give you some data to help you out. Is there a way that you had in mind that I could get you that data? I'm not certain how I would be able to get you any data in another format other than the pngs themselves like I did with the links above.

As of right now (ignoring the old processed score) I'm just doing a resize with Lanczos3 to make sure the image sizes are the same and then calling rgb_hybrid_compare vs rgba_hybrid_compare. For the rgb images, I'm just using DynamicImage::into_rgb8() to convert a rgba image into rgb.

ChrisRega commented 1 year ago

@cole21771 yes that's exactly the data I would need. Just save them to png after preprocessing and send them to admin@vdop.org please :) Then I'll have a look the next days. Thanks in advance Chris

ChrisRega commented 1 year ago

Hello @cole21771 Sorry for the long absence, had some minor health issues that stacked up. But I am back now :) So, I just had a look at the bottom two images - there was a bug on the branch with the alpha weighting on the similarity. With the commit from just now, I get:

Goose: 0.01106379833072424 200: 0.007834967225790024

That sounds reasonable to me, I also improved the visualization of the diffs a bit, see the attached images. Could you check whether this fixes your issues?

diff_200 diff_goose