garris / BackstopJS

Catch CSS curve balls.
http://backstopjs.org
MIT License
6.66k stars 602 forks source link

GH Actions tests -- what are they testing? #1533

Open garris opened 6 months ago

garris commented 6 months ago

Hey @dgrebb, I was looking at the GH Actions test output and saw that the npm run sanity-tests are detecting mismatch errors. My question is -- what exactly are we testing by running these? What kind of error would these GH Actions tests find?

image

Thanks!

dgrebb commented 6 months ago

I'm not sure I follow. The idea was to run sanity tests to ensure the project runs as expected, basically following the README to ensure it runs after any changes have been made for a feature, fix, etc.

Are you saying sanity tests are supposed to pass 100% without mismatches? I thought the mismatches were intentional.

garris commented 6 months ago

This is a confusing topic. This discussion falls under the peculiar ontology of test-app testing. 😛

Anyway. Here's my understanding...

As far as the GH automation knows, the sanity-test is "passing". (I think that is always the case since that is what the last npm logout command will return? Is that right?)

So, assuming that GH is not getting (or otherwise ignoring) backstopjs return errors. Then, under what condition would the GH automated test actually fail? Would it fail if it couldn't find chrome or whatever? What are those cases?

Hope this helps explain my question a bit.

dgrebb commented 6 months ago

Yeah - testing a testing product definitely isn't straight-forward The intent was to provide a quick view into PR health, and make merging decisions easier.

I'll do some more research into how one can test "expected" failures vs. total crash-and burn. Eg. if someone were to wipe the entirety of the cli directory and submit a PR 🙃.

Please let me know if there are any specific usecases you have in mind and I'll add them to the backlog.

dgrebb commented 6 months ago

Also @garris if you'd rather me make everything fail by default let me know - happy to do this as well.

dgrebb commented 6 months ago

@garris I have an idea! After I work through 'headless: new', I'm going to play around with some test fixtures of the report json. We can validate sanity/smoke test output and use that as pass/fail criteria.

garris commented 6 months ago

Awesome -- yes. This was basically where I was going with it too.

Just at a gut level -- seems the highest confidence thing we could do is go the 100% passing route.

The only hitch, this being visual regression, is the reference files need to match the environment.

So that kind of points me in the direction of the docker sanity and smoke tests, since theoretically, those fixtures are somewhat reliable across environments. I am not so concerned with testing a fail case -- but open to thoughts on that.

FWIW: some of the smoke tests are unreliable (my fault 😞) -- we'd probably need to tighten those up for use across the CI.

Anyhow -- thanks so much for all your effort! I am happy to collaborate as your schedule allows.

Cheers!

garris commented 6 months ago

Oh -- I just realized you were talking about JSON test fixtures for the UI. That is a great idea too!

dgrebb commented 6 months ago

tl;dr we're going to use jq and diff, which are built in, to test targeted pieces and the shape of generated reports.json after various npm tests are run.

Here's a sneak-peek at what that might look like:

image

I've been working on this, and am making good progress, but have tumbled down the rabbit hole.

Stepping away for a few days to let my discoveries marinate, but there are tons of iterations on the workflows below.