Betterment / alchemist

A Flutter tool that makes golden testing easy.
MIT License
266 stars 36 forks source link

request: Add diff tolerance #90

Open itsJoKr opened 1 year ago

itsJoKr commented 1 year ago

Is there an existing feature request for this?

Description

Since for the golden tests, you can add a comparator and make some tolerance: https://github.com/flutter/flutter/pull/77014 This option should also be available for Alchemist, ideally in AlchemistConfig.

Reasoning

I wanted to do this as a workaround because our golden test are failing with 0.00% diff (there can be like 1 pixel of difference) since Flutter 3.7.0.

I don't know what's the reason for this failure, I'm not sure it's related to Alchemist so I didn't want to open a bug here.

I don't see how else I can fix this since on my local machine all tests work fine.

Additional context and comments

No response

jeroen-meijer commented 1 year ago

Hi there,

Thanks for opening an issue!

While we can definitely work on a feature to allow users to define a margin of error on golden test comparisons, I think that would go against our idea of how we use these tests. In my own opinion, allowing any pixel difference above absolute 0% will impact the integrity of the test by more than 0% and may become a slippery slope.

Then again, if enough people want it, that might warrant us working on this.

My initial advice to your issue would be, have you tried regenerating all the golden files locally using Flutter 3.7.0 and committing that to source control?

That should solve your issue. If it doesn't we have another problem on our hands, but I don't think allowing more than a 0% diff would be the answer to it, but more a mitigation of a problem that shouldn't be there in the first place.

I might be wrong though. Let me know what you think! 😄

itsJoKr commented 1 year ago

@jeroen-meijer

have you tried regenerating all the golden files locally using Flutter 3.7.0 and committing that to source control? Yes, I did that. Went back now to confirm it, and I don't see that I'm doing anything wrong.

The widgets in golden tests are a bit more advanced canvas and RenderObject painting, so probably for most people everything is working fine: https://github.com/infinum/flutter-charts/actions/runs/4062513906/jobs/6993660895

Since this package promises to solve the issue of local vs CI goldens, then indeed that's a real problem for this package. But I wasn't sure if this is issue in Alchemist or in framework. That's why I didn't opened a bug report, although now when I think about it - that's the real case we need to address here.

Bassiuz commented 1 year ago

@itsJoKr I think I have similar issues after upgrading, for me it seems rendering images is a bit spotty, where it seems that some pixels aren't rendered the same.

While I agree that as a temporary fix a tolerance should be nice; it fixes the effect not the cause.

itsJoKr commented 1 year ago

@Bassiuz Exactly. For me, it's just like 1 or 2 pixels on the whole image (resulting in 0.00% diff).

Bassiuz commented 1 year ago

What seems to be a problem for us is that the CI runs on M1, while all our local laptops x86_64. Possibly rendering images changed for one of the architectures in Flutter 3.7.

itsJoKr commented 1 year ago

What seems to be a problem for us is that the CI runs on M1, while all our local laptops x86_64. Possibly rendering images changed for one of the architectures in Flutter 3.7.

In my case it's the opposite: M1 local laptop and Ubuntu x64 on GitHub Actions. But that also sounded to me like a possible reason.

Bassiuz commented 1 year ago

@jeroen-meijer Do you know if this is something that is caused in the latest Alchemist update, or is related to Flutter itself?

jeroen-meijer commented 1 year ago

Thanks for all the additional info everybody :)

As far as I'm aware, we didn't push any breaking updates, so this is something we'd need to explore further. Whether it's a discrepancy between M1 vs x86 renders, or it has to do with Flutter 3.7.0 (or both), we'll have to see.

I'll discuss with the team but IMO an Alchemist update that allows for a pixel difference might not be a bad idea. I'll let you all know A.S.A.P.!

jeroen-meijer commented 1 year ago

Hi there,

We discussed this internally. @marcossevilla mentioned the issue might have to do with the new Impeller rendering engine. See this issue for more details: flutter/flutter#119234

We've started working on a temporary dev version of Alchemist that supports setting a tolerance. This is an experimental version and we do not recommend most users use it if they can avoid it, but from our limited testing, it should be pretty stable, and does what y'all need. Here's the branch with the changes: https://github.com/Betterment/alchemist/tree/dev/feat/tolerance

Using the temporary version now

We have yet to finish the tests, update the docs and upload a version to pub.dev, but if you want to try it out in the meantime, change the alchemist dependency in your pubspec.yaml to point to the GitHub repo at that git commit, like so:

dev_dependencies:
  alchemist:
    git:
      url: https://github.com/Betterment/alchemist
      ref: d57c359ceadeae7ebde51639bcf9fbf37bb5f100

After doing so. you will find a tolerance value in the PlatformGoldensConfig and CiGoldensConfig. You can set this to any value between 0.0 (the default) and 1.0 (any image will match any image).

Try it out and let me know how it works for you.

itsJoKr commented 1 year ago

@jeroen-meijer The diff change works, thanks for the fast update.

About that Impeller issue, currently on stable Impeller is not turned on by default, you need to pass --enable-impeller flag. There's no such flags for tests, and to me it would be really surprising if impeller was somehow enabled by default for tests.

jeroen-meijer commented 1 year ago

@itsJoKr

Thanks for the feedback and the additional info about Impeller. I'll finish up the PR and get a new version up A.S.A.P.

This might take a while, so feel free to use the git reference in your pubspec until then. I'll update the thread once the new version is out.

inf0rmatix commented 1 year ago

Thanks for the quick workaround. Hopefully, we'll soon know the real cause of this issue!

andrzejchm commented 1 year ago

just FYI, the problem is mentioned in this issue on flutter's github: https://github.com/flutter/flutter/issues/111739

It looks like apple's sillicon macs are rendering goldens differently to all other platforms, hence the problem

soeren-schmaljohann-2denker commented 1 year ago

I'm getting an error on the newest flutter SDK version, when using the tolerance dev branch on CI:

Error: Couldn't resolve the package 'equatable' in 'package:equatable/equatable.dart'.
/opt/hostedtoolcache/flutter/stable-3.10.5-x64/.pub-cache/git/alchemist-9589cafe48f5565093df0737d4ed5a30e4ee1ce8/lib/src/alchemist_config.dart:4:8: Error: Not found: 'package:equatable/equatable.dart'
import 'package:equatable/equatable.dart';
       ^
/opt/hostedtoolcache/flutter/stable-3.10.5-x64/.pub-cache/git/alchemist-9589cafe48f5565093df0737d4ed5a30e4ee1ce8/lib/src/host_platform.dart:3:8: Error: Not found: 'package:equatable/equatable.dart'
import 'package:equatable/equatable.dart';
       ^
/opt/hostedtoolcache/flutter/stable-3.10.5-x64/.pub-cache/git/alchemist-9589cafe48f5565093df0737d4ed5a30e4ee1ce8/lib/src/alchemist_config.dart:66:31: Error: Type 'Equatable' not found.
class AlchemistConfig extends Equatable {
                              ^^^^^^^^^
/opt/hostedtoolcache/flutter/stable-3.10.5-x64/.pub-cache/git/alchemist-9589cafe48f5565093df0737d4ed5a30e4ee1ce8/lib/src/alchemist_config.dart:277:38: Error: Type 'Equatable' not found.
abstract class GoldensConfig extends Equatable {
                                     ^^^^^^^^^
/opt/hostedtoolcache/flutter/stable-3.10.5-x64/.pub-cache/git/alchemist-9589cafe48f5565093df0737d4ed5a30e4ee1ce8/lib/src/host_platform.dart:20:28: Error: Type 'Equatable' not found.
class HostPlatform extends Equatable implements Comparable<HostPlatform> {
hampsterx commented 1 year ago

sorry to +1 but..

@jeroen-meijer any update here, we have been using the branch for a while but want to get back onto 3.13 :(

I think at this point I would sacrifice my first born for this feature to get merged in. much appreciated~

Kirpal commented 1 year ago

@hampsterx In the meantime, I just rebased that branch onto main so it should be able to work with 3.13.

EArminjon commented 1 year ago

I tested with Flutter 3.13.4 and alchemist main branch, I still see the issue on my CI :

Golden "goldens/ci/clear-and-share-buttons-dark.png": Pixel test failed, 0.01% diff detected.
Golden "goldens/ci/clear-and-share-buttons-light.png": Pixel test failed, 0.03% diff detected.

Despite running the following command the issue persist :'(

flutter test --tags golden --update-goldens
EArminjon commented 1 year ago

On main i cannot see tolerance feature, specially on this file https://github.com/Betterment/alchemist/blob/main/lib/src/alchemist_config.dart

TesteurManiak commented 1 year ago

On main i cannot see tolerance feature, specially on this file https://github.com/Betterment/alchemist/blob/main/lib/src/alchemist_config.dart

@EArminjon that's because you should check on branch dev/feat/tolerance

EArminjon commented 1 year ago

Perfect , CI fixed :) !

AlchemistConfig get alchemistConfig {
  final bool isRunningInCi = Platform.environment.containsKey('MY_CI_VARIABLE');

  return AlchemistConfig(
    platformGoldensConfig: PlatformGoldensConfig(
      enabled: !isRunningInCi,
    ),
    ciGoldensConfig: CiGoldensConfig(
      enabled: isRunningInCi,
      tolerance: 0.1,
    ),
  );
}

LGTM feature until a future fix from Flutter !

jamontes79 commented 1 year ago

On main i cannot see tolerance feature, specially on this file https://github.com/Betterment/alchemist/blob/main/lib/src/alchemist_config.dart

@EArminjon that's because you should check on branch dev/feat/tolerance

With the latest commit on that branch and running the tests with very_good test I'm getting the following error:

══╡ EXCEPTION CAUGHT BY FLUTTER TEST FRAMEWORK ╞════════════════════════════════════════════════════
The following _Exception was thrown running a test:
Exception: Failed to set AlchemistFileComparator as the goldenFileComparator.

Since this test has a tolerance value above 0.0, Alchemist must set a custom
comparator to allow for matching goldens with a tolerance value.
However, the current goldenFileComparator is not a LocalFileComparator or
AlchemistFileComparator. Instead, it is a _TestOptimizationAwareGoldenFileComparator.

Alchemist can only set a custom comparator if the current comparator is a
LocalFileComparator or AlchemistFileComparator. User-provided comparators
are currently not supported.

Can someone support me? Thanks in advance

meg4cyberc4t commented 1 year ago

Are there any changes? It became necessary for me...

jamontes79 commented 8 months ago

Any news?

adamhalesworth commented 6 months ago

I experienced the same issue as above. Using the dev/feat/tolerance branch and a tolerance of 0.01 my golden tests pass 🙌. Until the issue is resolved this works fine, just be sure to use the latest ref:

dev_dependencies:
  alchemist:
    git:
      url: https://github.com/Betterment/alchemist
      ref: ab1f2eb49584339be119eaec28a5562f6b325cbc
Bassiuz commented 6 months ago

If you don't want to use a forked version of Alchemist and add tolerance; what worked for me is forcing all our Silicon based macs to use the Intel based Flutter version.

This is the script I use to download a custom flutter version and putting it in FVM.

#!/bin/sh

echo "Checking x86_64 Flutter version: $1"

FVM_VERSION_DIR="$HOME/fvm/versions"
FVM_X86_64_VERSION="$FVM_VERSION_DIR/custom_$1_x86_64"
FVM_DOWNLOAD="$FVM_VERSION_DIR/flutter_sdk_x86.zip"

# Check if FVM x86_64 is installed
if [ -d "$FVM_X86_64_VERSION" ]; then
    echo "Flutter version $1 is already installed for x86_64"
    exit 0
    else
      echo "Installing Flutter version $1 for x86_64"
      curl "https://storage.googleapis.com/flutter_infra_release/releases/stable/macos/flutter_macos_$1-stable.zip" -o "$FVM_DOWNLOAD"
      unzip "$FVM_DOWNLOAD" "flutter/*" -d "$FVM_X86_64_VERSION"
      # shellcheck disable=SC2164
      cd "$FVM_X86_64_VERSION/flutter"
      # shellcheck disable=SC2035
      mv * .* ../
      rm -rf "$FVM_X86_64_VERSION/flutter"
      rm -rf "$FVM_DOWNLOAD"
      exit 0
fi

You could then run .install-x86_64.sh 3.19.5 on the silicon based machine. If you then run fvm use custom_3.19.5_x86_64, your machine uses this intel based Flutter version and this should prevent most differences in processor architecture. Don't forget to reset FVM after running tests back to the 'normal' version.

vanlooverenkoen commented 1 month ago

@btrautmann any idea when this would land? We have been using the tolerance branch for quite some time already.

btrautmann commented 1 month ago

@jeroen-meijer did you have any thoughts about merging tolerance to main? I know at one point you had some reservations about doing so, but it seems to be used a lot in practice and it would probably be reasonable to let consumers dictate their own usage of it (whether it's acceptable to them and how much drift from golden images is acceptable).

vanlooverenkoen commented 1 month ago

FIY: I have fixed the branche in my fork (and using that one for now)

https://github.com/vanlooverenkoen/alchemist/tree/dev/feat/tolerance

jeroen-meijer commented 1 month ago

@btrautmann Looking back at this now, I feel that, if people want it, we should allow it. I think it's good practice to be opinionated in libraries, but this seems like a real use-case people need (whether it's because they don't care about minor changes, or because the Flutter framework is being quirky and a 0.00% diff pops up).

However, I think we should discuss what a tolerance setting means in this case, and to properly spec out the feature.

For example:

  1. Does tolerance: 0.01 mean we can have 1% tolerance across all golden tests (meaning 99 can pass fully and 1 may fail on a single pixel, but the whole suite is still considered valid)? Or does it mean any given golden test may have at most a 1% pixel difference with its golden reference?
  2. Where is this tolerance provided? In the golden test config? On a per-test basis? Both?

I haven't had time to check @vanlooverenkoen's branch yet, but I wanted to throw these questions out there as food for thought.

Anyone in this thread, feel free to chime in how you'd like to use this feature, so we can implement it in a way that works for everyone (especially considering the many changes we've done to Alchemist over the last few months including changes to the config and the addition of the GoldenTestTheme).

btrautmann commented 1 month ago

Does tolerance: 0.01 mean we can have 1% tolerance across all golden tests (meaning 99 can pass fully and 1 may fail on a single pixel, but the whole suite is still considered valid)?

My personal preference (and how I believe it's implemented on the aforementioned branch) is that the percentage itself applies on a per-test basis but is configured/set for the entire suite, i.e each test can be "off" by up to the globally configured percentage without failing.

This feels the most expected to me. I would probably push back on a per-test tolerance because that actually feels like a pretty big foot gun/maintenance nightmare. To me, the primary use case for this tolerance should be minor platform differences (like the one in this PR which is a 0.64% diff) or quirks in the Flutter framework, neither of which I would expect to be particularly meaningful at the per-test level. If we allowed for a per-test configuration, consumers would be scattering these values all over their test suites. While not "technically" an issue, it just feels like a bit of a headache. Happy to be disagreed with though 😅

jeroen-meijer commented 1 month ago

Got it. Agreed, that makes sense! I don't see how configuring it for an individual test adds that much value, and it might incentivize people to tune the tolerance for each test in such a way that they can get away with a huge diff.

Nit-pick but interesting to think about: Do we want to throw a WARN log on the screen if a diff was detected > 0 and <= tolerance?

Also, it looks like we can use @vanlooverenkoen's branch as a foundation for this feature, correct? If so, I'm happy to co-author it if needed, or at least review.

btrautmann commented 1 month ago

Nit-pick but interesting to think about: Do we want to throw a WARN log on the screen if a diff was detected > 0 and <= tolerance?

I actually like this idea. Another idea for the wishlist (not sure if possible), it'd be sweet if we could somehow track the overall test suite status and warn/throw if all tests passed and were below the configured tolerance. This would mean that you could effectively lower your tolerance closer to zero without impacting test success.

_Note: this would both require a hook for test suite completion/result and us being able to identify only alchemist driven tests. The latter may be easy via tags, but I'm not aware of anything providing notification of test suite completion._

RE: the branch, I actually think you and @Kirpal authored the original code which I imagine @vanlooverenkoen has patched on his own! https://github.com/Betterment/alchemist/tree/dev/feat/tolerance

jeroen-meijer commented 1 month ago

RE: the branch, I actually think you and @Kirpal authored the original code

lmao I'm an idiot, you're right. Sweet 👌🏻 I'll submit a PR somewhere next week at the latest (keeping @vanlooverenkoen's changes in mind, if any).

I actually like this idea.

Great, let's implement it and see what the community thinks.

warn/throw if all tests passed and were below the configured tolerance

You mean a system that will basically say "hey, you've allowed your tests to tolerate a 10% diff, but only a 5% diff was detected, so consider lowering your tolerance"?

I don't know if such a feature holds a lot of value (since I believe most users who set the tolerance to anything other than 0.0 will do so in situations where the actual diff percentage may vary from time to time).

However, jumping off of that idea, we could consider creating something like an AlchemistTestReport that would be aggregated and returned at the end of the test suite, and giving the users the option to do with that data whatever they'd like? 🤷🏻

btrautmann commented 1 month ago

I don't know if such a feature holds a lot of value (since I believe most users who set the tolerance to anything other than 0.0 will do so in situations where the actual diff percentage may vary from time to time).

Yeah, my thinking was more the use case of: You bump tolerance to tolerate a particular test/Widget. Later, this Widget is removed but your tolerance stays the same. If left that way, you have extra "buffer" between your suite's actual diff and the tolerated diff but you don't need that buffer. I could see teams wanting to keep that buffer as slim/minimal as possible, and this would allow them to do that. This is similar to linting systems that allow baselining; often they'll have warnings baked in that warn you if you've baselined a lint violation that no longer exists/no longer violates a particular lint rule.

However, jumping off of that idea, we could consider creating something like an AlchemistTestReport that would be aggregated and returned at the end of the test suite, and giving the users the option to do with that data whatever they'd like?

That could be a good compromise/v1. If users then said "would love to have this fail the build when we're not hitting the tolerance" we could always add that in within the same bits of code that produce the report 🤔