flutter / flutter

Flutter makes it easy and fast to build beautiful apps for mobile and beyond
https://flutter.dev
BSD 3-Clause "New" or "Revised" License
165.16k stars 27.24k forks source link

Golden tests running with impeller like `impeller_Play_AiksTest_CanRenderSweepGradientManyColorsRepeat_Metal` are flaky #124295

Closed jmagman closed 1 year ago

jmagman commented 1 year ago

I'm surprised this iOS embedder memory leak fix kicked off a flutter gold check, and even more surprised it caused Golden file changes have been found for this pull request.

https://github.com/flutter/engine/pull/40957

jmagman commented 1 year ago

The unexpected flutter gold check is still pending and blocking the PR: https://github.com/flutter/engine/pull/40957

keyonghan commented 1 year ago

@Piinks Could you help take a look?

Piinks commented 1 year ago

I believe the image is flaky, I filed https://github.com/flutter/flutter/issues/124277

@mdebbar maintains the engine instance of Gold.

mdebbar commented 1 year ago

I took a look at the failing goldens: https://flutter-engine-gold.skia.org/search?crs=github&issue=40957&negative=true&patchsets=1&positive=true

They fall in 2 buckets:

  1. Images with tiny differences from their golden baseline (e.g. this: new pixel is #C42AA3 vs baseline pixel #C426A3. This is correctly auto-approved by fuzzy.
  2. Images with a pixel that's drastically different from the baseline (e.g. this: new pixel is #FFDBAF vs baseline pixel #6200A2). Look at the highlighted pixel in the center:
New Baseline
image image

Unfortunately, Gold's fuzzy matching doesn't handle this case well.

That said, this feels like a flakiness in impeller that needs to be addressed. The specific pixel sometimes gets painted yellow, and sometimes purple.

chinmaygarde commented 1 year ago

From Triage: We discussed this w.r.t what to do with gold tests that run on results obtained from a specific GPU. We are not going to guarantee pixel perfect results across GPUs (even from a specific vendor). Based on guidance from the Skia GPU team, the amelioration was that the Gold failures be reasoned about out-of-band and neither the autorollers or the presubmit checks block PRs. I believe this has been done already. You should not see a red check mark on your presubmits and the autorollers should not care about this difference. The delta will still be triaged as I said, but out of bank.

and blocking the PR.

Did you see the gold failures trigger a red check or was it in the pending state? Also, was the autosubmit/autoroller refusing to process this?

Dan (OOO) also had WIP where gold would instances would only use a specific GPU type. But I am not sure that work was done. In any case, this shouldn't block PRs.

cc @gaaclarke.

Piinks commented 1 year ago

Dan (OOO) also had WIP where gold would instances would only use a specific GPU type. But I am not sure that work was done.

Would it be easier to specify the GPU as a parameter for the golden file image? Gold supports thousands of params for image tests, we could just have separate images for each GPU by adding the GPU as a key of the image.

Piinks commented 1 year ago

These are the current params I can see we differentiate images for in the engine instance:

image

gaaclarke commented 1 year ago

@Piinks Here is what dan's PR is doing: https://github.com/flutter/engine/pull/41216/files#diff-356abf8d8d08b3ba08c35d6399e2a21d3107aa16c55f4303bb3b8f96195bbcc0R239

The code looks good, but I'm not sure if that's going to give us the result we want in skia gold (it sounds like it?)

Piinks commented 1 year ago

Oh nice, that looks like exactly that, although slightly different, I'll leave mention on the PR. :+1:

chinmaygarde commented 1 year ago

This is WAI. The ameliorations for different GPU returning different results were that the golden runs were being checked out of band, a global conservative fuzz factor applied to gold checks, and the restriction of devices that can perform these gold checks (most notably removing the Intel Iris devices).

github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.