Automattic / wp-calypso

The JavaScript and API powered WordPress.com
https://developer.wordpress.com
GNU General Public License v2.0
12.42k stars 1.99k forks source link

Implement Visual regression framework. #51916

Open bsessions85 opened 3 years ago

bsessions85 commented 3 years ago

Per: pbAok1-1Oc-p2

spin up visual regression infrastructure, add tests for layouts in editor and frontend per theme.

bsessions85 commented 3 years ago

Trying Backstopjs to start.

bsessions85 commented 3 years ago

I've been trying to test running visual regression tests on the page templates in the editor. I am seeing a lot of inconsistency there. Here is an example of some results. You can see in the image two versions I am seeing regularly and it seems to bounce back and forth between them. Occasionally I will also see other differences. This leads me to believe that testing in the editor may be too flaky to be useful. I'll keep trying to see if delays or anything else will help, but I wanted to post on update here. Screen Shot 2021-04-19 at 2 02 33 PM

bsessions85 commented 3 years ago

It is also worth noting that I am running these within a docker container so that we don't get differences between environments.

simison commented 3 years ago

cc @ockham who I believe was looking into visual diff testing of templates or blocks in the core. Have you noticed any similar flakiness?

kwight commented 3 years ago

Hm, are those differences from the theme of the site? It looks like the test site has a different theme, which could be loading editor styles (accounting for the differences).

bsessions85 commented 3 years ago

@kwight, good question. I'll look in to that! thanks

bsessions85 commented 3 years ago

The theme was part of the problem. I didn't realize that different themes changed how it looks in the editor. It now looks better, but is still flaky. Things like images being slightly different or sometimes not loading are the next challenge.

kwight commented 3 years ago

I didn't realize that different themes changed how it looks in the editor.

It's pretty scattered; themes can enqueue styles into the editor to make it look more like the front-end – some themes do, some don't. Some do it well, some don't.

Things like images being slightly different or sometimes not loading are the next challenge.

Does that have something to do with the origin? Some could be coming from URLs, Photon, or the media library directly. I know blocks like the Gallery block have pretty awkward image handling by necessity, depending on environment.

Is your work in a PR somewhere?

simison commented 3 years ago

I didn't realize that different themes changed how it looks in the editor.

It's also part of the equation that frequently breaks visuals in the editor because it's so hard to notice those differences when upgrading themes (think e.g. alignment breaking, extra spacing appearing).

bsessions85 commented 3 years ago

@kwight Here is an initial PR. Running the test command will give you a report of the differences. Re-running that command 2 or 3 times usually will yield failures of some sort. https://github.com/Automattic/wp-calypso/pull/52161

kwight commented 3 years ago

@bsessions85 Oh sweet, I'm curious to give it a run.

What are your early impressions at this point? Looks promising, or otherwise?

bsessions85 commented 3 years ago

What are your early impressions at this point? Looks promising, or otherwise?

At this point I don't think it is going to work well for the editor testing that we were hoping to get. Things just don't load in there consistently enough to make it an effective tool. Every second or 3rd run will have something just slightly different from the reference file and it will fail. For example, in the image below, 3 of the 4 tests passed, but this one failed because things are indented differently. The next run would probably pass though, so updating the reference isn't the issue. It is just how it decides to display it from one run to the next.

Screen Shot 2021-04-22 at 7 27 50 AM
simison commented 3 years ago

Very surprising! Gutenberg version or theme didn't change in-between, right?

I would expect output to stay consistent of course — it's WYSIWYG editor after all. Would be good to get to the bottom of why it keeps changing.

bsessions85 commented 3 years ago

Very surprising! Gutenberg version or theme didn't change in-between, right?

Not unless the Gutenberg version is on a/b test or something. The theme for sure isn't changing.

kwight commented 3 years ago

I tested this for a while today, and got basically the same results as @bsessions85. Almost all of the problems though, were with the one template Bowen. (I also got a rare problem with Rivington that appears to be the snapshot being taken before all of the images can load into the slider – I don't know if this is something that can be "fixed" by waiting a little longer).

The Bowen reference itself is actually wrong – the correct appearance is the test shot in the example of a failure above. However, I re-generated the references, got a correct Bowen reference, and then proceeded to have about the same number of failures. It does seem to be something weird with the Bowen template itself though – swapping it out for team, and I was able to get an impressive run of full passes (broken only by another Rivington empty slider).

Fonts (and other styles?) are an issue in the testing too. They don't appear to be getting loaded by the testing instance (nor by the e2e testing site, for that matter).

kwight2021.wordpress.com e2eflowtesting3.wordpress.com Backstop
Screen Shot 2021-04-22 at 2 41 22 PM Screen Shot 2021-04-22 at 2 40 57 PM Screen Shot 2021-04-22 at 2 46 20 PM

I feel like this is pointing to issues with the page template system plus style enqueuing, but I have no concrete code to point to or anything (I think this impression is from also seeing different thumbnails for the templates at different times?).

I noticed the test site used by Backstop is on the (now quite old) Twenty Fifteen theme. I mean, it shouldn't matter that it isn't a modern theme (and the default themes are better maintained than any others I believe), but it made me wonder if newer themes might handle blocks better. ¯_(ツ)_/¯

bsessions85 commented 3 years ago

So it turns out there is an issue with the latest version of backstopjs that causes that issue 🤦 I used an older version and everything is looking much better!

I took it a step further and loaded the editor outside of the iframe so that I can capture the whole template in the editor and am able to run tests against all the templates and they will pass!

I'm attaching the report so anyone can see it if they want to. html_report.zip

Next step is to get it cleaned up and set it up to run in CI.

griffbrad commented 3 years ago

@Automattic/team-calypso-platform I wanted to loop you folks in here because this could be helpful for Gutenberg release testing and as a complement to e2es in some situations.

@scinos I know you’ve got experience with visual regression testing, so wanted you to have the chance to evaluate the approach here so far and provide input.