Add SSIM Comparison - Githubissues

americanexpress / jest-image-snapshot

✨ Jest matcher for image comparisons. Most commonly used for visual regression testing.

Apache License 2.0

3.83k stars 198 forks source link

Add SSIM Comparison #201

Closed omnisip closed 4 years ago

omnisip commented 4 years ago

Pixel to pixel comparison can fail frequently, but more often than not, it's because of minor variations in rendering and the compression algorithm. This issue was really common when I worked with designing live video streaming systems, so we switched away from PSNR models to something called SSIM (Structural Similarity). I believe if this library implements it, it will have significantly less false positives.

A well maintained javascript library exists for this today -- https://github.com/obartra/ssim . I would be happy to implement it into this software if I have the time.

github-actions[bot] commented 4 years ago

Thanks for opening your first issue. Pull requests are always welcome!

anescobar1991 commented 4 years ago

Thanks @omnisip this is interesting!

If you are willing to contribute I think this would be an interesting option to add but before spending too much time on it could you quantify your statement about reducing false positives also spend some time comparing the performance of the diff process (is it faster or slower to diff with SSIM?)

Also looking at the readme for https://github.com/obartra/ssim I have concerns over requiring node-gyp. If that is really the case then I don't think I'd want this implemented.

If perf is comparable, false positives are drastically reduced, and node-gyp is not required then feel free to contribute! The reason I am being so picky about this is I want to make sure there is real benefit to users before adding an option that could overcomplicate the code and add a larger surface area to what was supposed to be a very simple jest utility.

omnisip commented 4 years ago

I expect it to be significantly faster and definitely more accurate, but we won't know until we try. Short answer to your node gyp question, is it's probably required since it uses 'canvas' as a dependency.

But that's also the reason it's much faster. Pixelmatch has a serious disadvantage if being in js when lots of vector math is required. It's actually a double whammy even if it's using uint32 arrays under the hood because JavaScript only has floating point numbers.

That said, if there's a way to stub out the implementation for pixelmatch it's still worth a try. The accuracy difference will be night and day better and it'll be a lot faster.

Dan

On Sat, Apr 25, 2020, 12:44 Andres Escobar notifications@github.com wrote:

Thanks @omnisip https://github.com/omnisip this is interesting!

If you are willing to contribute I think this would be an interesting option to add but before spending too much time on it could you quantify your statement about reducing false positives also spend some time comparing the performance of the diff process (is it faster or slower to diff with SSIM?)

Also looking at the readme for https://github.com/obartra/ssim I have concerns over requiring node-gyp. If that is really the case then I don't think I'd want this implemented.

If perf is comparable, false positives are drastically reduced, and node-gyp is not required then feel free to contribute! The reason I am being so picky about this is I want to make sure there is real benefit to users before adding an option that could overcomplicate the code and add a larger surface area to what was supposed to be a very simple jest utility.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-619423743, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJXU4OAZXSRKFYV6PITROMVQJANCNFSM4MPY3HVA .

anescobar1991 commented 4 years ago

Go ahead and try it out @omnisip!

omnisip commented 4 years ago

I was incorrect about it requiring separate dependencies or OS specific dependencies. That's only if it needs a special image loader.

Here is an example of how much better it works.

These two files are the same -- except one has been converted to jpg. The SSIM is 0.9999278172146295 with the fastest transformation (bezkrovny).

Using a pixelmatch or an equivalent algorithm, I end up with 96K pixels as being different and roughly 2.57% error. That's high for something that is perceptually impossible to distinguish.

On Wed, Apr 29, 2020 at 4:34 PM Andres Escobar notifications@github.com wrote:

Go ahead and try it out @omnisip https://github.com/omnisip!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-621324817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJWUGHUIULGR65CV3R3RPBJJJANCNFSM4MPY3HVA .

anescobar1991 commented 4 years ago

This is great news! How much faster is it?

omnisip commented 4 years ago

How do we build a good test? Do you have a set of sample images you want to churn through?

On Thu, May 14, 2020, 17:19 Andres Escobar notifications@github.com wrote:

This is great news! How much faster is it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-628935814, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJR7PO7JKCYSK76LQQDRRR36NANCNFSM4MPY3HVA .

anescobar1991 commented 4 years ago

We have not done much in terms of performance testing before but we do have integration tests defined in https://github.com/americanexpress/jest-image-snapshot/blob/master/__tests__/integration.spec.js and a set of test images used by those tests in https://github.com/americanexpress/jest-image-snapshot/tree/master/__tests__/stubs so you could use that.

omnisip commented 4 years ago

I'll check them. It's going to sound strange, but PNG might not be ideal for the analysis itself. You'd think it'd be best, but it's performance is going to die when text and images are merged together, it's also not particularly fast at decoding.

Jpeg might be a lot better for this application since it will always be first pass (single generation loss) and the files will be a lot smaller. I need to check Huffman coding again, but I'm pretty sure it'll dedupe better too because of the way the instead format is designed in macroblocks. Meaning less git repository bloat.

This option isn't possible with a pure pixel by pixel matching solution but is possible with an ssim solution.

I'll do some research and see what I come back with.

Dan

On Thu, May 14, 2020, 21:03 Andres Escobar notifications@github.com wrote:

We have not done much in terms of performance testing before but we do have integration tests defined in https://github.com/americanexpress/jest-image-snapshot/blob/master/__tests__/integration.spec.js and a set of test images in https://github.com/americanexpress/jest-image-snapshot/tree/master/__tests__/stubs

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-628999311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJVN3UBVCBRO7EEDH3LRRSWIVANCNFSM4MPY3HVA .

omnisip commented 4 years ago

Preliminary analysis shows pixelmatch being faster by about 5x in your best cases (where there's a lot of identical adjacent pixels e.g. same color), and 2x slower in your worst cases based off of the samples in your repository.

Analysis wise though, there's no contest. SSIM is significantly better.

E.g. TestImage vs. TestImageFailure returns 21.6% difference in pixels per pixelmatch, whereas SSIM returns a score of 11% similiarity (with bezkrovny model) and 21% similiarity

In your oversize case (LargeTestImage*) -- pixelmatch shows only a 1.2% difference in pixels. However, SSIM shows a 96.7% similarity with both bezkrovny and standard models

I'm thinking one could reasonably set a threshold with SSIM at 99% and never think twice about it.

Dan

On Fri, May 15, 2020 at 3:14 AM Dan Weber dweber@gmail.com wrote:

I'll check them. It's going to sound strange, but PNG might not be ideal for the analysis itself. You'd think it'd be best, but it's performance is going to die when text and images are merged together, it's also not particularly fast at decoding.

Jpeg might be a lot better for this application since it will always be first pass (single generation loss) and the files will be a lot smaller. I need to check Huffman coding again, but I'm pretty sure it'll dedupe better too because of the way the instead format is designed in macroblocks. Meaning less git repository bloat.

This option isn't possible with a pure pixel by pixel matching solution but is possible with an ssim solution.

I'll do some research and see what I come back with.

Dan

On Thu, May 14, 2020, 21:03 Andres Escobar notifications@github.com wrote:

We have not done much in terms of performance testing before but we do have integration tests defined in https://github.com/americanexpress/jest-image-snapshot/blob/master/__tests__/integration.spec.js and a set of test images in https://github.com/americanexpress/jest-image-snapshot/tree/master/__tests__/stubs

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-628999311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJVN3UBVCBRO7EEDH3LRRSWIVANCNFSM4MPY3HVA .

anescobar1991 commented 4 years ago

Want to open a PR to add this as an option? The reason I am saying it should be an option is that I don't think we should have another breaking change anytime soon.

mdugue commented 4 years ago

this sounds really convincing. Can't wait to see further progress here

omnisip commented 4 years ago

En route guys, en route.

On Tue, Jun 2, 2020 at 9:31 AM Manuel Dugué notifications@github.com wrote:

this sounds really convincing. Can't wait to see further progress here

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-637417238, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJSPMMOMP7WCFVYIDE3RUTBH7ANCNFSM4MPY3HVA .

omnisip commented 4 years ago

Want to open a PR to add this as an option? The reason I am saying it should be an option is that I don't think we should have another breaking change anytime soon.

The PR should work perfectly without any breaking changes to existing users. SSIM is implemented as a new comparisonMethod with the default being pixelmatch. I'm really looking forward to seeing and hearing your feedback! :-)

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity.

jhildenbiddle commented 4 years ago

Thank you @omnisip for the excellent work.

I tried comparisonMethod: 'ssim' this morning and so far I am pleased with the results. Like others, I was running into issues comparing screenshots with different operating system and browser combinations (specifically, using Jest and Playwright to test Chromium, Firefox, and Webkit on macOS, Ubuntu, and Windows). I know that the recommendation from the maintainers is to test inside of a docker container to avoid false positives, but this would involve significantly more work than just have jest-image-snapshot perform smarter comparisons.

Here is an example of an image snapshot (from Chromium) that we're testing against:

example-test-js-example-tests-image-snapshots-1-chromium-snap

The challenge (as has been covered in other issues) is how to handle the visual differences between screenshots generated using the same browser (which should match) on difference operating systems. Previously, I was using the following settings for pixel-based comparisons:

customDiffConfig: {
  threshold: 0.3,
},
failureThreshold: 0.04

These more-or-less worked, but it felt like significant accuracy was being lost by setting threshold to 0.3. After setting comparisonMethod to ssim, I am now using the following settings:

failureThreshold: 0.15

The recommended default setting of failureThreshold: 0.01 resulted in an ~12% difference between the above screenshot rendered in headless Chromium on both macOS and Ubuntu, hence the 0.15 setting used (I wanted to give myself a little wiggle room for future tests).

It's hard to tell which combination of comparison method and threshold settings allow for greater accuracy, but my gut tells me ssim will be less picky in the long run. Then again, maybe I'm just favoring The Shiny New Thing. Time will tell.

omnisip commented 4 years ago

Can you send me the samples to compare? Also the diff image you did with them? I'd like to see these at 0.01 with both ssim: 'bezkrovny' and ssim: 'fast'. [Note 'fast' is not faster than bezkrovny, but it is more accurate.]

On Thu, Jul 23, 2020 at 10:20 PM John Hildenbiddle notifications@github.com wrote:

Thank you @omnisip https://github.com/omnisip for the excellent work.

I tried comparisonMethod: 'ssim' this morning and so far I am pleased with the results. Like others, I was running into issues comparing screenshots with different operating system and browser combinations (specifically, using Jest and Playwright to test Chromium, Firefox, and Webkit on macOS, Ubuntu, and Windows). I know that the recommendation from the maintainers is to test inside of a docker container to avoid false positives, but this would involve significantly more work than just have jest-image-snapshot perform smarter comparisons.

Here is an example of an image snapshot (from Chromium) that we're testing against:

[image: example-test-js-example-tests-image-snapshots-1-chromium-snap] https://user-images.githubusercontent.com/442527/88342792-17c14600-cd0e-11ea-8322-4d39ccb8b15d.png

The challenge (as has been covered in other issues) is how to handle the visual differences between screenshots generated using the same browser (which should match) on difference operating systems. Previously, I was using the following settings for pixel-based comparisons:

customDiffConfig: { threshold: 0.3,},failureThreshold: 0.04

These more-or-less worked, but it felt like significant accuracy was being lost by setting threshold to 0.3. After setting comparisonMethod to ssim, I am now using the following settings:

failureThreshold: 0.15

The recommended default setting of failureThreshold: 0.01 resulted in an ~12% difference between the above screenshot rendered in headless Chromium on both macOS and Ubuntu, hence the 0.15 setting used (I wanted to give myself a little wiggle room for future tests).

It's hard to tell which combination of comparison method and threshold settings allow for greater accuracy, but my gut tells me ssim will be less picky in the long run. Then again, maybe I'm just favoring The Shiny New Thing. Time will tell.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-663260247, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJVB25XLMPZEC4AJAL3R5CZRDANCNFSM4MPY3HVA .

jhildenbiddle commented 4 years ago

@omnisip --

Of course. For what it's worth, these are just example tests I'm using while I get our e2e configuration in place (switching from Cypress.io). I was surprised to learn just how much text rendering differences alone can complicate screenshot comparisons.

Original Snapshots

These snapshots were generated on macOS using playwright's page.screenshot() feature.

macos-chromium macos-chromium-snap

macos-firefox macos-firefox-snap

macos-webkit macos-webkit-snap

Image Diffs

Our CI test matrix renders the same content seen in the snapshots above using Chromium, Firefox, and Webkit on macOS, Ubuntu, and Windows. Screenshots are taken for each os+browser combination, which are then compared to the reference snapshots of the matching browser type. For example, ubuntu-chromium.png will be compared to macos-chromium.png but not to macos-firefox.png or macos-webkit.png. This is done in hopes making screenshot comparisons more accurate.

Here are the diff images and statistics generated by jest-image-snapshot:

ubuntu-chromium 11.116726154016055% different from snapshot (102451.74823541196 differing pixels) ubuntu-chromium-diff

ubuntu-firefox 0.4187165861719522% different from snapshot (3858.8920581607113 differing pixels) ubuntu-firefox-diff

ubuntu-webkit 12.17770280391608% different from snapshot (112229.7090408906 differing pixels) ubuntu-webkit-diff

windows-chromium 1.4968674877126276% different from snapshot (13795.130766759576 differing pixels) windows-chromium-diff

windows-firefox 2.087652134020601% different from snapshot (19239.80206713386 differing pixels) windows-firefox-diff

windows-webkit NOTE: A font rendering issue causes windows+webkit to have an unusually high diff percentage 26.405261426690828% different from snapshot (243350.8893083827 differing pixels) windows-webkit-diff

Thanks for taking a look at these, btw. Very much appreciated!

omnisip commented 4 years ago

Okay cool.

1) Are you using the default ssim choice or 'fast'?

2) Are there any size mismatches and/or do you have allowSizeMismatch turned on?

3) How are you controlling for different versions of each browser on each platform?

On Thu, Jul 23, 2020, 20:54 John Hildenbiddle notifications@github.com wrote:

@omnisip https://github.com/omnisip --

Happy to. Original Snapshots

These snapshots were generated on macOS using playwright's page.screenshot() feature.

macos-chromium [image: macos-chromium-snap] https://user-images.githubusercontent.com/442527/88356059-929d5780-cd34-11ea-95fa-1d9a676883ef.png

macos-firefox [image: macos-firefox-snap] https://user-images.githubusercontent.com/442527/88356062-93ce8480-cd34-11ea-9030-8a5c9468b232.png

macos-webkit [image: macos-webkit-snap] https://user-images.githubusercontent.com/442527/88356065-94671b00-cd34-11ea-8346-1ffa322c026c.png Image Diffs

Our CI test matrix renders the same content seen in the snapshots above using Chromium, Firefox, and Webkit on macOS, Ubuntu, and Windows. Screenshots are taken for each os+browser combination, which are then compared to the reference snapshots of the matching browser type. For example, ubuntu-chromium.png will be compared to macos-chromium.png but not to macos-firefox.png or macos-webkit.png. This is done in hopes making screenshot comparisons more accurate.

Here are the diff images and statistics generated by jest-image-snapshot https://github.com/americanexpress/jest-image-snapshot:

ubuntu-chromium 11.116726154016055% different from snapshot (102451.74823541196 differing pixels) [image: ubuntu-chromium-diff] https://user-images.githubusercontent.com/442527/88356290-4ef71d80-cd35-11ea-953b-2c6dfe551b96.png

ubuntu-firefox 0.4187165861719522% different from snapshot (3858.8920581607113 differing pixels) [image: ubuntu-firefox-diff] https://user-images.githubusercontent.com/442527/88356291-4f8fb400-cd35-11ea-88c9-d9c206bea37c.png

ubuntu-webkit 12.17770280391608% different from snapshot (112229.7090408906 differing pixels) [image: ubuntu-webkit-diff] https://user-images.githubusercontent.com/442527/88356292-50c0e100-cd35-11ea-9be1-8a9f2fa02727.png

windows-chromium 1.4968674877126276% different from snapshot (13795.130766759576 differing pixels) [image: windows-chromium-diff] https://user-images.githubusercontent.com/442527/88356323-67673800-cd35-11ea-86d3-f13d2514319b.png

windows-firefox 2.087652134020601% different from snapshot (19239.80206713386 differing pixels) [image: windows-firefox-diff] https://user-images.githubusercontent.com/442527/88356326-6930fb80-cd35-11ea-8c4d-e32ac23c121e.png

windows-webkit 26.405261426690828% different from snapshot (243350.8893083827 differing pixels) [image: windows-webkit-diff] https://user-images.githubusercontent.com/442527/88356329-6a622880-cd35-11ea-9cd7-3ac46ce54016.png

Thanks for taking a look at these, btw. Very much appreciated!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-663325205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJVXCDEQKVU7XG7AXSTR5DZVFANCNFSM4MPY3HVA .

jhildenbiddle commented 4 years ago

1) Are you using the default ssim choice or 'fast'?

Default.

2) Are there any size mismatches and/or do you have allowSizeMismatch turned on?

Yes, allowSizeMismatch is set to true. I believe the width of the Windows screenshots was 1px less than the macOS screenshots they were being compared to which lead to enable this option.

3) How are you controlling for different versions of each browser on each platform?

I'm not. I've been operating under the assumption that Playwright uses the same browser version across operating systems based one the browsers/platform matrix shown on the repo's README.md:

omnisip commented 4 years ago

First please try with ssim fast for comparison.

It's definited with an example in the readme. When you're done, send me the new diffs with it.

If that doesn't work, we probably need to fix the size mismatches. They can really mess up comparisons across the board even if it's only one pixel.

On Thu, Jul 23, 2020, 23:10 John Hildenbiddle notifications@github.com wrote:

Are you using the default ssim choice or 'fast'?

Default.

Are there any size mismatches and/or do you have allowSizeMismatch turned on?

Yes, allowSizeMismatch is set to true. I believe the width of the Windows screenshots was 1px less than the macOS screenshots they were being compared to which lead to enable this option.

How are you controlling for different versions of each browser on each platform?

I'm not. I've been operating under the assumption that Playwright uses the same browser version across operating systems based one the browsers/platform matrix shown on the repo's README.md https://github.com/microsoft/playwright:

[image: Screen Shot 2020-07-24 at 1 04 35 AM] https://user-images.githubusercontent.com/442527/88362287-e8302f00-cd49-11ea-924d-0864c2993249.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-663349189, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJSJQHVGIXAIOAA3USTR5EJTDANCNFSM4MPY3HVA .

jhildenbiddle commented 4 years ago

Here are the snapshot diff statistics and images.

TL;DR: using ssim: 'fast' slightly increases the diff percentage, which would be my expectation given that fast is slower-but-more-accurate than the default bezkrovny setting.

Configuration

{
  allowSizeMismatch: true, // Windows CI fix
  comparisonMethod: 'ssim',
  customDiffConfig: {
    ssim: 'fast',
  },
  customSnapshotIdentifier(data) {
    return `${data.defaultIdentifier}-${browserName}`;
  },
  diffDirection: 'vertical',
  failureThreshold: 0.01,
  failureThresholdType: 'percent',
  noColors: true,
  runInProcess: true, // macOS CI fix
}

Regarding allowSizeMismatch, this is required only for our Windows screenshots which for some reason have a width 1px less than the reference screenshots. I'm less worried about this at the moment because the Windows diff statistics show either a low percentage difference (chromium-diff = 1.7%, firefox-diff = 2.7%) or diff percentages that are in line with ubuntu (ubuntu-webkit-diff = 12.9%, windows-webkit-diff = 13.9%) indicating that the 1px size mismatch isn't a huge issue.

Original Snapshots

Same as the ones posted in https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-663325205.

Image Diffs

ubuntu-chromium-diff 12.125042040334522% different from snapshot (111744.38744372295 differing pixels) vs. 11.116726154016055% using bezkrovny

ubuntu-firefox-diff Successful match using SSIM w/ failureThreshold: 0.01 and ! 🥳 vs. 0.4187165861719522% using bezkrovny

ubuntu-webkit-diff 12.9088979174103% different from snapshot (118968.40320685333 differing pixels) vs. 12.17770280391608% using bezkrovny

windows-chromium-diff 1.6544634425014748% different from snapshot (15247.535086093592 differing pixels) vs 1.4968674877126276% using bezkrovny

windows-firefox-diff 2.378168241687295% different from snapshot (21917.19851539011 differing pixels) vs. 2.087652134020601% using bezkrovny

windows-webkit-diff NOTE: The font-rendering issue that caused the unusually high diff percentage (26.4%) in windows-webkit-diff screenshot in the previous post has been fixed by using playright@next. Therefore, the diff statistics below should not be compared to those numbers (i.e switching to ssim: 'fast' did not reduce the diff percentage from 26.4% to 13.89%). 13.891127611529564% different from snapshot (128020.63206785645 differing pixels) vs. 26.405261426690828% using bezkrovny with font-rendering bug windows-webkit-diff

jhildenbiddle commented 4 years ago

@omnisip --

Adding one more diff that may be useful. Specifically, these are small snapshots taken using the same browser (Webkit) on different platforms (macOS & Windows) that result in a high diff percentage. This is easily addressed by increasing the failureThreshold value to any value greater than the diff percentage (e.g. 0.16) but then test accuracy is lost.

macOS Webkit Snapshot

macos-webkit-snap

Windows Webkit Snapshot

windows-webkit-snap

Diff 15.603094816705564% different (1446.250858560439 differing pixels) windows-webkit-diff

Perhaps my hopes are unrealistic, but I was hoping ssim would allow me to handle relatively small structural image differences like these.

omnisip commented 4 years ago

If what I saw in the larger images is also happening to these small ones, the tests are properly failing.

Wait until I get back to my desk and I'll show you want I mean.

On Fri, Jul 24, 2020, 10:21 John Hildenbiddle notifications@github.com wrote:

Adding one more diff that may be useful. Specifically, these are small snapshots taken using the same browser (Webkit) on different platforms (macOS & Windows) that result in a high diff percentage. This is easily addressed by increasing the failureThreshold value to any value greater than the diff percentage (e.g. 0.16) but then test accuracy is lost.

macOS Webkit Snapshot

[image: macos-webkit-snap] https://user-images.githubusercontent.com/442527/88412203-30316f00-cda7-11ea-99d7-564fc23b7550.png

Windows Webkit Diff 15.603094816705564% different from snapshot (1446.250858560439 differing pixels) [image: windows-webkit-diff] https://user-images.githubusercontent.com/442527/88412225-3a536d80-cda7-11ea-9291-d86f3d7a3ec8.png

Perhaps my hopes are unrealistic, but I was hoping ssim would allow me to handle relatively small structural image differences like these.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-663619596, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJR2LTSPPPGVPXSVFQDR5GYIPANCNFSM4MPY3HVA .

jhildenbiddle commented 4 years ago

Sounds good. Thanks, @omnisip.

FWIW, switching to the following pixel-based comparison settings allows the smaller "Docsify Test" image comparison to pass:

customDiffConfig: {
  threshold: 0.3,
},
failureThreshold: 0.04,

I can twiddle the knobs on both pixel- and ssim- based comparisons to get tests to pass (knowing that doing so is less ideal than testing on a single OS using a docker container). The challenge for me is understanding which comparison method is the better choice once the "good enough to pass tests" threshold(s) are set. The best I can do is review the pixelmatch demo and the SSIM playground and try to judge for myself. As stated earlier, my hope (which was perhaps unrealistic) was that SSIM would provide a clear and significant advantage over pixel-based comparisons for scenarios like mine (same content, slight differences in text rendering). It appears that this isn't the case, which could lead others to be equally confused about which comparison method they should pick. My assumption would be that many users will latch on to the "reduced false positives" claim and opt for ssim without any real understanding of if/how it is a better option. Just my $0.02.

omnisip commented 4 years ago

See below:

If you look closely, those aren't exactly minor structural differences. Look how far off the 's's are in both words and the offset in the T.

However, what you're doing isn't wrong. You're trying to determine how close one platform's version of a screen is to another -- so the algorithm may not be tuned properly for this comparison.

If you're willing to experiment, you can adjust the search window size for the SSIM library. This is the search bounding boxes of pixels (NxN) it uses to calculate the changes in each block of the screen. The default is 11px.

An example configuration then would be -- { ssim: 'fast', windowSize: 24} -- to try the regular ssim algorithm with a 24x24 pixel window.

On Fri, Jul 24, 2020 at 4:28 PM Dan Weber dweber@gmail.com wrote:

If what I saw in the larger images is also happening to these small ones, the tests are properly failing.

Wait until I get back to my desk and I'll show you want I mean.

On Fri, Jul 24, 2020, 10:21 John Hildenbiddle notifications@github.com wrote:

Adding one more diff that may be useful. Specifically, these are small snapshots taken using the same browser (Webkit) on different platforms (macOS & Windows) that result in a high diff percentage. This is easily addressed by increasing the failureThreshold value to any value greater than the diff percentage (e.g. 0.16) but then test accuracy is lost.

macOS Webkit Snapshot

[image: macos-webkit-snap] https://user-images.githubusercontent.com/442527/88412203-30316f00-cda7-11ea-99d7-564fc23b7550.png

Windows Webkit Diff 15.603094816705564% different from snapshot (1446.250858560439 differing pixels) [image: windows-webkit-diff] https://user-images.githubusercontent.com/442527/88412225-3a536d80-cda7-11ea-9291-d86f3d7a3ec8.png

Perhaps my hopes are unrealistic, but I was hoping ssim would allow me to handle relatively small structural image differences like these.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-663619596, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJR2LTSPPPGVPXSVFQDR5GYIPANCNFSM4MPY3HVA .

omnisip commented 4 years ago

Picture reupload for viewers / commenters --

jhildenbiddle commented 4 years ago

Thank you for the excellent feedback, @omnisip. SSIM's window option looks interesting, so I'll continue to experiment.

Definitely learned a few new things along the way, so thank you for your time and effort. Very much appreciated.

l-abels commented 4 years ago

I saw a significant increase in failures with SSIM. Images that previously beat a threshold of 0.08 are now in the 0.15-0.25 range. I'll try increasing the window size before I increase the threshold to 3%. I'd be pretty worried about false negatives at that level.

anescobar1991 commented 4 years ago

Yeah that's pretty high. Want to join https://one-amex.slack.com/ to discuss more? There is a #jest-image-snapshot channel there.

omnisip commented 4 years ago

Can you send us before, after, and diff?

On Tue, Aug 4, 2020, 18:05 l-abels notifications@github.com wrote:

I saw a significant increase in failures with SSIM. Images that previously beat a threshold of 0.08 are now in the 0.15-0.25 range. I'll try increasing the window size before I increase the threshold to 3%. I'd be pretty worried about false negatives at that level.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/americanexpress/jest-image-snapshot/issues/201#issuecomment-668890722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJW4FD6HQBF4XSQS7G3R7CO5RANCNFSM4MPY3HVA .