cashapp / paparazzi

Render your Android screens without a physical device or emulator
https://cashapp.github.io/paparazzi/
Apache License 2.0
2.22k stars 210 forks source link

Rendering differences between platforms seems to have increased with 1.3.4 #1465

Open VirtualParticle opened 1 month ago

VirtualParticle commented 1 month ago

Description In previous versions, there have been slight variations between the Paparazzi screenshots generated on Linux CI runs and local MacOS runs. This required setting a maximum percent difference above zero, although I was able to set it very low (0.00005%) and still have it work. The primary difference seems to be around text anti-aliasing and color blending more generally.

In the latest version, however, it seems like the difference has increased. It might be something with the new version of layoutlib. Seems like this is happening on the Paparazzi repo itself too: https://github.com/cashapp/paparazzi/pull/1314/files#r1544924645

Steps to Reproduce

Here's an example of a test with a screenshot originally generated on MacOS and failing on Linux with Paparazzi 1.3.4 (this has a difference of 0.007134%):

@Test
fun example() {
    paparazzi.snapshot {
        Box(
            modifier = Modifier.background(Color(0xFF000033))
        ) {
            Text("ExampleText", color = Color.White)
        }
    }
}

image

With Paparazzi 1.3.3, the same test passes on Linux CI with the screenshot generated on MacOS, even with the maxPercentDifference set to 0.

When the accessibility overlay is enabled, the error increases to 0.083399%.

image

Expected behavior I'd expect the tests to be identical on all platforms but that wasn't the old behavior either. Acceptable behavior would just be to have the discrepancy be small enough to ignore. To deal with the higher difference scenarios, we'd have to set the maxPercentDifference to be high enough where actual bugs may be missed.

Additional information:

peterdk commented 1 month ago

We are having the same issue. PR to bump to 1.3.4 fails on CI (Linux), while on our developer Macbooks no verify step fails and also record step does not generate 'new' screenshot (after initial re-record). So this blocks us from going to 1.3.4, cause we need this in our CI.

Of our 300 screenshot tests, 3 fail now consistently.

jrodbx commented 1 month ago

So this blocks us from going to 1.3.4, cause we need this in our CI.

Have you experimented with values for maxPercentDifference?

hanna-h commented 1 month ago

So this blocks us from going to 1.3.4, cause we need this in our CI.

Have you experimented with values for maxPercentDifference?

As @VirtualParticle rightfully mentioned:

To deal with the higher difference scenarios, we'd have to set the maxPercentDifference to be high enough where actual bugs may be missed.

jrodbx commented 1 month ago

As @VirtualParticle rightfully mentioned:

To deal with the higher difference scenarios, we'd have to set the maxPercentDifference to be high enough where actual bugs may be missed.

understood, but do @peterdk and @VirtualParticle work on the same project? if not, i'd be curious to know if the maxPercentDifference might work for @peterdk.

VirtualParticle commented 1 month ago

As @VirtualParticle rightfully mentioned:

To deal with the higher difference scenarios, we'd have to set the maxPercentDifference to be high enough where actual bugs may be missed.

understood, but do @peterdk and @VirtualParticle work on the same project? if not, i'd be curious to know if the maxPercentDifference might work for @peterdk.

We do not work on the same project.

mshearer123 commented 1 month ago

Same issue, different project. Seems mostly on bottom sheets and progress indicator. have tried the animation rule

peterdk commented 1 month ago

Have you experimented with values for maxPercentDifference?

I'm very reluctant to start working around image differences by raising a error threshold. Especially if it's caused by OS differences.

BraisGabin commented 4 weeks ago

With Paparazzi 1.3.2 I had a suite of 1428 tests passing with a maxPercentDifference = 0. When bumping to 1.3.4 1344 of those 1428 fail because of the difference between the screenshots generated on Mac and the ones generated on Linux. Even moving the maxPercentDifference to 0.01 I had some of those tests failing.

peterdk commented 4 weeks ago

@BraisGabin did you do a re-record after bumping? Cause there are (unfortunately) sometimes minor changes when bumping paparazzi. Cause 1428 failures is quite high, to only link that to the difference between mac and linux.

BraisGabin commented 4 weeks ago

Yes, I did.

As I said on 1.3.2 I was running all my tests with maxPercentDifference = 0 without any problem. With 1.3.2 I always got green tests recording on Mac and then verifying on Linux.

Now with 1.3.4 I got all those failing tests. If I change the maxPercentDifference to 0.01 I get less failing tests but I still get some.

dniHze commented 3 weeks ago

It's the same story for us at Cuvva with loads of failures between Mac/Arm64 and Linux/x86. Just like others, we got loads of screenshots regenerated after the upgrade (from 1.3.1 to 1.3.4). I'm not sure if it has something to do with Paparazzi itself, as it could be the layoutlib doing something wonky on different platforms/architectures.

rafsanjani commented 3 weeks ago

We are having the same issue as well which is preventing us from upgrading to 1.3.4. We have regenerated the screenshots but there are still differences on our linux runners. The problem is mostly caused by text anti-aliasing when the delta images are observed.

linean commented 3 weeks ago

Same issue here; the max difference is about 0.2%. Unfortunately, that's too high and could introduce bugs. Our ideal range was 0.05%.

ReginFell commented 3 weeks ago

We have pretty much the same issue. After upgrading from 1.3.3 to 1.3.4, a lot of screenshots were updated on Mac, but they fail on the CI (Linux). With maxPercentDifference = 0.005, they fail, but with maxPercentDifference = 0.1, it works, although it creates a huge threshold for issues.

savvisingh commented 3 weeks ago

Having the same issue, is there any other solution rather than increasing maxPercentDifference, Can we expect a fix for this?

BraisGabin commented 3 weeks ago

Having the same issue, is there any other solution rather than increasing maxPercentDifference, Can we expect a fix for this?

My work around, right now, is just not upgrade to this version. It's a pity, I really want to verify gifs but 🤷

jrodbx commented 2 weeks ago

Can we expect a fix for this?

I'm not sure what the root cause is yet, so hard to promise a fix yet, but I'll be investigating this issue this week.

gabrielittner commented 1 week ago

@jrodbx I've got a very simple reproducer for a platform difference, just having a semi transparent box.

Column(
  Modifier
    .background(Color.White)
    .fillMaxSize()
) {
  Box(
    Modifier
      .fillMaxSize()
      .padding(48.dp)
      .background(Color.Black.copy(alpha = 0.5f))
  ) {
  }
}

I've put it into a test and the CI fails on Linux but succeeds on macOS (the committed screenshot was generated on macOS): https://github.com/cashapp/paparazzi/commit/6f50b720c2e086f3457905c68730d37ad085d4b5 https://github.com/freeletics/paparazzi/actions/runs/9592840917

jrodbx commented 1 week ago

FYI, still investigating, but wanted to share notes:

For this investigation, I've been using the snippet provided in https://github.com/cashapp/paparazzi/issues/1465#issue-2326059848 and had already been investigating before @gabrielittner's simpler repro here.

First, I ran the tests in this branch on JDK {17,18} x {linux, mac, win} x {legacy view, compose} toolkits, using the version @ SHA = 7ca1390. This is the SHA right before Iguana merged.

All 12 files (2 jdks x 3 os x 2 toolkits) are identical. This is where we want to be.

After Iguana, JDKs still have no impact (whew...), but OS platforms clearly do.

Theory 1

Hypothesis: This commit (Use premultiplied alpha images for layoutlib) is the root cause for the rendering change Result: FALSE While analyzing the decompiled class for RenderSessionImpl in Iguana's LayoutLib shows that the BufferedImage type has changed from type 2 (TYPE_INT_ARGB) to type 3 (TYPE_INT_ARGB_PRE), supplying an ImageTransformation via SessionParams in order to reverse this by copying the type 2 image into a new image of type 3 and diffing the two shows that the two images are still identical (presumably because this particular image is fully opaque).

Theory 2

Hypothesis: An additional change is needed to account for recomposition which would allow pending work to stabilize. Result: FALSE This was a bit of a guess attempt, given that I know @geoff-powell had been looking into this. I cherry-picked from Geoff’s branch which attempts to solve this. However, it does not impact the affected images for this repro. In retrospect, this was a silly trial and the outcome makes sense as the repro snippet does not appear to do anything special that would benefit from a second layout pass.

Theory 3

Hypothesis: The view hierarchies' metadata are slightly different, so different inputs result in different outputs. Result: FALSE TL;DR I copied the sources of https://github.com/square/radiography and https://github.com/square/curtains into a branch and analyzed the hierarchies. This was the point where I also decided to write a legacy view (vs a Composable) version of the test, since legacy view hierarchies seemed easier to analyze recursively to peek at its metadata.

Theory 4 (ongoing)

Hypothesis: The way in which the hierarchy gets rasterized is the problem.

This commit fundamentally changed how drawing occurs in Layoutlib (released in Iguana).

When Paparazzi is ready to render the image from a view hierarchy, it calls:

// PaparazziSdk#L302 
val result = renderSession.render(true)

which calls:

// RenderSessionImpl#252
return this.renderAndBuildResult(freshRender, false);

Later down the road, when the native image has been captured, it's copied into the image buffer that we ultimately write to the file system:

// RenderSessionImpl#renderAndBuildResult
        int[] imageData = ((DataBufferInt)this.mImage.getRaster().getDataBuffer()).getData();
        Image.Plane[] planes = this.mNativeImage.getPlanes();
        IntBuffer buff = planes[0].getBuffer().asIntBuffer();
        int len = buff.remaining();
        buff.get(imageData, 0, len);

In a debugger, viewing imageData in the local stack frame shows the captured bitmap screenshot.

What happens in between these two points differs from Hedgehog to Iguana...

Pre-Iguana

// RenderSessionImpl#347
        renderResult = renderAndBuildResult(this.mViewRoot, this.mRenderer);
    private Result renderAndBuildResult(ViewGroup viewRoot, HardwareRenderer renderer) {
        AttachInfo_Accessor.dispatchOnPreDraw(viewRoot);
        RenderNode node = viewRoot.updateDisplayListIfDirty();
        renderer.setContentRoot(node);
        renderer.createRenderRequest().syncAndDraw();
        return Status.SUCCESS.createResult();
    }
// LayoutlibRenderer#21
    public void setContentRoot(RenderNode content) {
        RecordingCanvas canvas = this.mRootNode.beginRecording();
        canvas.scale(this.scaleX, this.scaleY);
        canvas.enableZ();
        canvas.drawColor(0, BlendMode.CLEAR);
        if (content != null) {
            canvas.drawRenderNode(content);
        }

        canvas.disableZ();
        this.mRootNode.endRecording();
    }

Post-Iguana

// RenderSessionImpl#335
        this.mRenderer.draw(this.mViewRoot);
// LayoutlibRenderer#25
        this.draw(viewGroup, rootView.mAttachInfo, new ThreadedRenderer.DrawCallbacks() {
// ThreadedRenderer#374
    void draw(View view, View.AttachInfo attachInfo, DrawCallbacks callbacks) {
        attachInfo.mViewRootImpl.mViewFrameInfo.markDrawStart();
        this.updateRootDisplayList(view, callbacks);
// ThreadedRenderer#355
    private void updateRootDisplayList(View view, DrawCallbacks callbacks) {
        ...
        this.updateViewTreeDisplayList(view);
        ...
            RecordingCanvas canvas = this.mRootNode.beginRecording(this.mSurfaceWidth, this.mSurfaceHeight);

            try {
                int saveCount = canvas.save();
                canvas.translate((float)this.mInsetLeft, (float)this.mInsetTop);
                callbacks.onPreDraw(canvas);
                canvas.enableZ();
                canvas.drawRenderNode(view.updateDisplayListIfDirty());
                canvas.disableZ();
                callbacks.onPostDraw(canvas);
                canvas.restoreToCount(saveCount);
                this.mRootNodeNeedsUpdate = false;
            } finally {
                this.mRootNode.endRecording();
            }
        ...

TL;DR as the commit message calls out, more of the execution flow is moved out of the Layoutlib shim code and into the actual Android rendering code, which is theoretically the ideal approach. But somehow, that also appears to be causing the rendering differences...?

I would ideally like to debug the canvas draw native calls that call into the layoutlib native dylib on Mac OS, but I don't know and I think it involves having the debug symbols? And I haven't successfully been able to build Layoutlib from source to maybe work around that. Any advice from the community on how to achieve this?

In the meantime, using the Paparazzi test rule as an Android platform bootstrap (thanks to Layoutlib), I've found that I can write to a RecordingCanvas and create RenderNodes directly in a unit test, so I may try inlining the above code to see if I can come up with a minimal repro to file against Google's Layoutlib team in request for more info.

jrodbx commented 1 week ago

@gabrielittner one theory I also wanted to try/dismiss was whether I messed up the Iguana migration. I recall you had a commit where you had tweaked the withTime() argument with a bespoke offsetting of Choreographer_Delegate.sChoreographerTime in order to get your version working. I ultimately went with a different approach, but if you still have that branch/commit somewhere (Github history has since cleaned it up from your original PR), it would be worth trying out.

kozaxinan commented 1 week ago

We had the exact background color with alpha usage. We mitigated the paparazzi issue using brush and alpha instead of color with alpha. It is not great to change production code to make the test happy but it is better than ignoring the tests

I wanted to share this working version of alpha usage. It might help you create different hypotheses.

  .background(
      brush = SolidColor(Color.Black),
      alpha = 0.5f,
  )
gabrielittner commented 1 week ago

@jrodbx this was the last state of my branch from before your iguana PR was merged https://github.com/cashapp/paparazzi/compare/master...freeletics:paparazzi:layoutlib-iguana It did also have platform differences though as you can see in the last commit

jrodbx commented 1 week ago

this was the last state of my branch from before your iguana PR was merged

Ok, confirmed that incorporating the approach in that branch does not prevent the regression found in this issue.

jrodbx commented 1 week ago

@jrodbx I've got a very simple reproducer for a platform difference, just having a semi transparent box. ... I've put it into a test and the CI fails on Linux but succeeds on macOS (the committed screenshot was generated on macOS): 6f50b72 https://github.com/freeletics/paparazzi/actions/runs/9592840917

@gabrielittner i've incorporated your repro and that of @kozaxinan into my two working branches (please check my work!):

...and I'm not seeing any differences in the outputs across OS platforms when I test with pre-Iguana and post-Iguana bases.

One thing to call out is in both of my test branches, I commented out scaling of the bitmap to focus on the Iguana effects themselves. Could that be what's accounting for the differences in your respective observations?