Betterment / alchemist

A Flutter tool that makes golden testing easy.
MIT License
251 stars 35 forks source link

fix: Flutter 3.16 make some goldens fail on CI platform #104

Closed orevial closed 3 days ago

orevial commented 10 months ago

Is there an existing issue for this?

Version

0.7.0

Description

Since recent Flutter upgrade to 3.16 I now have issues after regenerating goldens on my machine, some of them fail on CI due to slight differences.

My process:

  1. I run the tests and update the goldens on my MacBook Pro using default config PlatformGoldensConfig(enabled: true). All my golden files are generated correctly.
  2. If I run a test locally on my MacBook Pro without updating the goldens, everything is fine, tests all succeed.
  3. When I push to my CI (running Ubuntu) and goldens tests are run using config PlatformGoldensConfig(enabled: false), then I get a lot (but not all) golden tests failing with a very slight diff percentage.

Now I don't understand what is happening but what I can reproduce for sure is that all golden tests that contain multiple scenarii and hence multiple lines or columns, FAIL.

Multiple columns goldens

Take this golden file as an example that contains a dumb container with a border:

simple-container-one-column

Code:

  goldenTest(
    'Simple container',
    fileName: 'simple-container-one-column',
    builder: () => GoldenTestGroup(
      columns: 1,
      scenarioConstraints: const BoxConstraints(
        minWidth: 100,
        maxWidth: 100,
        minHeight: 50,
        maxHeight: 50,
      ),
      children: [
        GoldenTestScenario(
          name: '100x50 container with blue border',
          child: _separator(Colors.blue),
        ),
      ],
    ),
  );

✅ This golden file is correctly generated and asserted during CI tests.

Now take the same test that just duplicates previous scenario:

  goldenTest(
    'Simple container, two columns',
    fileName: 'simple-containers-two-columns',
    builder: () => GoldenTestGroup(
      columns: 2,
      scenarioConstraints: const BoxConstraints(
        minWidth: 100,
        maxWidth: 100,
        minHeight: 50,
        maxHeight: 50,
      ),
      children: [
        GoldenTestScenario(
          name: '100x50 container with blue border',
          child: _separator(Colors.blue),
        ),
        GoldenTestScenario(
          name: '100x50 container with blue border',
          child: _separator(Colors.blue),
        ),
      ],
    ),
  );

✅On my machine, everything runs as expected:

simple-containers-two-columns

❌ But this time on the CI, I have a slight difference showing at t

he left border of my container for the second scenario (second column) as shown in the following golden failure masked & isolated diffs:

simple-containers-two-columns_maskedDiff

simple-containers-two-columns_isolatedDiff

Multiple lines goldens

Same thing happens for goldens tests that contain a single column but multiple scenarii.

Take this example that runs smoothly on my machine :

checkboxes

❌ Now on the CI, I would get a very subtle one pixel diff at the end of the scenarii separator.

Here is the isolated diff:

checkboxes_isolatedDiff

I've pointed the actual 1 pixel difference with a big red arrow because the diff is very subtle 😅

Conclusion

What is happening here ? I would expect the CI test, that explicitly doesn't display fonts, would work well.

There seems to be an issue with how alchemist package separate the different scenarii in multiple lines or columns after upgrading to Flutter 3.16.

Steps to reproduce

  1. Add the following code:

    import 'package:alchemist/alchemist.dart';
    import 'package:flutter/material.dart';
    
    Widget _separator(
    Color color, [
    String? text,
    ]) =>
      Container(
        decoration: BoxDecoration(
          border: Border.all(color: color, width: 2),
        ),
        height: 25,
        child: text != null
            ? Text(
                text,
                textAlign: TextAlign.center,
              )
            : null,
      );
    
    void main() {
    goldenTest(
      'Simple container',
      fileName: 'simple-container-one-column',
      builder: () => GoldenTestGroup(
        columns: 1,
        scenarioConstraints: const BoxConstraints(
          minWidth: 100,
          maxWidth: 100,
          minHeight: 50,
          maxHeight: 50,
        ),
        children: [
          GoldenTestScenario(
            name: '100x50 container with blue border',
            child: _separator(Colors.blue),
          ),
        ],
      ),
    );
    
    goldenTest(
      'Simple container, two columns',
      fileName: 'simple-containers-two-columns',
      builder: () => GoldenTestGroup(
        columns: 2,
        scenarioConstraints: const BoxConstraints(
          minWidth: 100,
          maxWidth: 100,
          minHeight: 50,
          maxHeight: 50,
        ),
        children: [
          GoldenTestScenario(
            name: '100x50 container with blue border',
            child: _separator(Colors.blue),
          ),
          GoldenTestScenario(
            name: '100x50 container with blue border',
            child: _separator(Colors.blue),
          ),
        ],
      ),
    );
    }
  2. Generate the golden tests (flutter test update-goldens) on a first platform (e.g. a MacBook Pro or Windows machine)
  3. Run the tests on a CI platform (e.g. Ubuntu)
  4. Check for failures

Expected behavior

Golden tests should pass and we shouldn't have diffs that are introduced by alchemist library itself (on its separators).

Screenshots

No response

Additional context and comments

No response

Arthurius commented 9 months ago

Any news on this issue? It's really annoying that some tests keep failing because of this

vanlooverenkoen commented 8 months ago

@Kirpal or @jeroen-meijer sorry that I tag you both personally, but I feel like this is a crucial bug that should be fixed for Flutter 3.16.x, any idea if someone is already taking a look at this issue?

vanlooverenkoen commented 8 months ago

Updating the goldens on Linux results in changed images. But running the tests again on our dev machine (macos) they fail -> which makes sense, because previously our goldens were built using macos. It seems that there were some changes on the goldentests itself.

vanlooverenkoen commented 8 months ago

Might be related to https://github.com/flutter/flutter/issues/51344

vanlooverenkoen commented 8 months ago

Just to be 100% sure @orevial your CI is also Linux right? (a different OS to your dev machine?)

orevial commented 8 months ago

Just to be 100% sure @orevial your CI is also Linux right? (a different OS to your dev machine?)

Yes my CI is a Ubuntu machine!

vanlooverenkoen commented 8 months ago

I feel like flutter or alchemist, I'm currently not sure if it is a flutter issue or an alchemist issue, renders on half pixels and that seem to be different on linux & macos.

master (macos):

image

test (linux):

image

maksedDiff:

image

isolatedDiff:

image

That also explains why some tests are failing and some are succeeding. I am trying some things in this PR (don't mind the successfull status, that was the easiest way to get the failed images archived): https://github.com/vanlooverenkoen/alchemist/pull/1

This log for example shows that not all tests are failing: https://github.com/vanlooverenkoen/alchemist/actions/runs/7500793916/job/20420196325

These are all failing in 3.16 and were working on 3.13

❌ /home/runner/work/alchemist/alchemist/test/smoke_tests/timer_button_smoke_test.dart: smoke test succeeds after tapping button with timer (variant: CI) (failed)
❌ /home/runner/work/alchemist/alchemist/test/smoke_tests/interactions_smoke_test.dart: smoke test succeeds in regular state (variant: CI) (failed)
❌ /home/runner/work/alchemist/alchemist/test/smoke_tests/interactions_smoke_test.dart: smoke test succeeds while pressed (variant: CI) (failed)
❌ /home/runner/work/alchemist/alchemist/test/smoke_tests/interactions_smoke_test.dart: smoke test succeeds while long pressed (variant: CI) (failed)
vanlooverenkoen commented 8 months ago

@btrautmann can you help us with this issue? I feel like whenever something that should be rendered is 1 pixel or smaller it results in failures on CI. As wel as disabling shadow rendering. Instead a border is rendered somewhere (I have not been able to find where exactly this is done) but that also results in differences. (The border happens to be 1pixel or smaller)

vanlooverenkoen commented 8 months ago

@Piinks do you have an idea if something changed in 3.16 that could affect this?

vanlooverenkoen commented 8 months ago

@orevial do you have any tests that are flutter_test only (so no alchemist) where you have the same issues? I can't seem to find anything in alchemist that would fix/cause this bug. But on the other hand I can't find any case where it happens with flutter_test only. 🤯

orevial commented 8 months ago

@orevial do you have any tests that are flutter_test only (so no alchemist) where you have the same issues? I can't seem to find anything in alchemist that would fix/cause this bug. But on the other hand I can't find any case where it happens with flutter_test only. 🤯

That was exactly where I landed when I tried to investigate this issue within alchemist package...

To answer your question: not really because my non-alchemist golden tests run with real fonts enabled, which are known to render differently on different platforms (what I mean is that they will always fail anyway...). I will try and remove the font loading and keep the default Ahem font to see whether my tests are different between my macOs and Ubuntu platforms...

vanlooverenkoen commented 8 months ago

@orevial thanks, Yeah I have no idea where to go from here. And I think I have tagged all people who might know how to fix the issue. I guess at some point there will be more people who need 3.16 support. (which is already out for 2 months already)

orevial commented 8 months ago

Agreed, thanks for your help ! My "workaround" in the meantime is to always run the golden tests on the same platform:

It's not great for local performances because running tests in a Docker image take twice the time as running them directly with Flutter CLI, but at least it make tests repeatable and robust...

vanlooverenkoen commented 8 months ago

Yeah indeed. I think we will move our tolerance up until there is a fix.

@betterment & @verrygoodventures if there is something I need to do or can test let me know I am happy to help. But I am just stuck right now.

orevial commented 8 months ago

Just tested running one of my e2e test that does not use alchemist with default Ahem font, I get some small diffs too. Here is my master image: community_posts_masterImage

And now you can see the isolated diff that was generated in the failures folder: community_posts_isolatedDiff

I don't know if that proves anything, is the problem is really with Flutter 3.16 or if my test could have another setting responsible for the small diff (in particular, I don't know if alchemist modify other settings than the font to make CI-generated independant of the host platform...).

vanlooverenkoen commented 8 months ago

Hmm yeah indeed. I guess I will wait on a proper solution from the alchemist team itself.

maxAtIppen commented 4 weeks ago

Having the same issue.

Flutter: 3.22.2 CI: Ubuntu alchemist: 0.7.0

vanlooverenkoen commented 6 days ago

@btrautmann were you able to reproduce this?

btrautmann commented 6 days ago

@btrautmann were you able to reproduce this?

Yes, as mentioned in https://github.com/Betterment/alchemist/issues/103#issuecomment-2341208297 there are small diffs even when disabling useMaterial3. This is almost certainly to underlying Flutter changes, which is why I feel comfortable re-generating Alchemist's smoke test goldens.

There is only one diff (see below) that I don't really understand, but again, nothing changed on Alchemist's part in the upgrade to 3.16.0, so this is likely also related to Flutter. I think a matrix of tests that runs daily will help us to better isolate when these changes occur. If we start failing we can look at what's landed on the beta channel and the culprit will likely be more obvious. Running on each commit to beta may be even better, just not sure how feasible that is.

Note the difference in height of the individual lines:

Isolated: blocked_text_image_reference_isolatedDiff

Masked: blocked_text_image_reference_maskedDiff

Master: blocked_text_image_reference_masterImage

Test: blocked_text_image_reference_testImage

CC @jeroen-meijer who may have a better read on why this may be happening.

btrautmann commented 6 days ago

@vanlooverenkoen you can see the diff of re-generated goldens in https://github.com/Betterment/alchemist/pull/118 if you want to take a peek.

btrautmann commented 3 days ago

Update: We recently released 0.8.0 which has a minimum version of 3.16.0. As part of that (and as mentioned in https://github.com/Betterment/alchemist/issues/103#issuecomment-2341208297), we re-generated the golden images to be compatible with the latest Flutter changes. The goal is to do that as seldomly as necessary in the future, but here it was a necessity. I've opened https://github.com/Betterment/alchemist/pull/123 as a way of getting more advanced notice of these types of breaks in the future, but will need to find a way to decouple alchemist from the Material changes that keep having downstream impacts before we can enable that workflow, as mentioned in https://github.com/Betterment/alchemist/issues/120.

I'm going to close this issue in the meantime.