Acceptance tests - Githubissues

Jesus89 commented 6 years ago

Acceptance tests:

[x] Add script to get screenshot references from the examples
[x] Define the acceptance tests
[x] Implement acceptance testing framework
[x] Move examples to +acceptance folder
[x] Custom simple datasets in the cartogl user.
[x] Check metrics
[x] Backport to node 6

IagoLast commented 6 years ago

Since this domain is hard to test, we must establish a QA process, this involves:

Define a list of supported browsers.
Define a list of supported-use cases.
For every new release test all those cases in all compatible browser.

We can also try to automate this process using some screenshot testing utility.

Jesus89 commented 6 years ago

I'm OK with jasmine and karma :rocket:

Jesus89 commented 6 years ago

For the e2e/acceptance testing Exquisite seems to be a good option but I would look for alternatives in order to be sure that there is nothing better.

IagoLast commented 6 years ago

About Acceptance testing:

We agreed on having an examples folder with unitary examples for every use-case and use automated screenshot testing to compare reference images against new taken images.

Problems

Every machine is different, and there are significant changes between Linux and OSX making our tests fail, two proposals are in the table:
Having different references for every platform: This means having the screenshots for travis/CI as reference and if you want to run acceptance tests locally you have to use your own reference screenshots. Taking reference screenshots is a one-time process.
Increasing the tolerance of the img diff algorithm, this way small differences between platforms are ignored and we only need a single reference for all.

I suspect that we are going to need a threshold so high that the tests will become useless and will always pass. Thats my reason for that's the reason why I lean towards option 2.

cc @Jesus89 @davidmanzanares

davidmanzanares commented 6 years ago

I suspect that we are going to need a threshold so high that the tests will become useless and will always pass.

I don't think so.

If I was gonna do this, I would use ImageMagick comparison metrics instead of local references because it's easier to test on new machines and environments.

Since we've extensively discussed this in other channels, I'm not gonna block a PR with the local references approach, but I don't think it's the right approach.

IagoLast commented 6 years ago

@davidmanzanares I think we should run some tests with the threshold approach.

What metrics do you suggest?

davidmanzanares commented 6 years ago

@IagoLast PSNR should work well enough, but ImageMagick has many: https://www.imagemagick.org/script/command-line-options.php#metric

Jesus89 commented 6 years ago

This is partially solved by: https://github.com/CartoDB/renderer-prototype/pull/51.

So let's rename this ticket to "Integration tests"

rochoa commented 6 years ago

Having a threshold is OK. In a 256*256 image, a 0.3% error results in ~200 different pixels. We can live with that margin of error or even higher most of the time.

The tolerance we used for comparing PNG32 and PNG8 images in Windshaft was set to 8.5%. At the time, we used the simple FUZZ metric. You can check the differences in https://github.com/CartoDB/Windshaft/blob/master/test/results/compare.html.

IagoLast commented 6 years ago

Acceptance POC

Removed the attribution label when testing (https://github.com/CartoDB/renderer-prototype/pull/79)
Added a threshold to our diff tool (89f026)
Compared linux/osx screenshots of well rendered single-layer example
- With threshold < 0.1 the test said images are different
- With threshold = 0.1 the test said images are equivalent 🎉
Compared bad-rendering examples against local reference with threshold = 0.1
- Simulated a "big rendering failure" setting the color of the points to rgba(0, 0, 0 ,0) --> images are different 🎉
- Simulated a more moderate failure, color: rgba(0, 1, 0, 1) --> images are different 🎉
- Simulated a subtile failure, width: float(6) --> images are different 🎉

reference	actual	result
		🎉
		🎉
		🎉

More complicated example: rampNumericBuckets.html

Compared linux against osx well rendered images
- With threshold < 0.5 the test said images are different 👎
- With threshold = 0.5 the test said images are equivalent 😐

reference	actual	result
		👎

IagoLast commented 6 years ago

[x] Move examples to +acceptance folder
[x] Custom simple datasets in the cartogl user.
[x] Check metrics

Jesus89 commented 6 years ago

The current metric tool pixelmatch detects the number of different pixels in an image using a threshold by default of 0.1. This threshold is used to compute maxDelta that is the maximum acceptable square distance between two colors. Taking into account that he maximum possible value for the YIQ difference metric is 35215, this is the relation between threshold and maxDelta:

delta-threshold

For each pixel position in both images a delta value (squared YUV distance between colors) is computed. If this value is greater or equal than maxDelta the pixels are different, otherwise they are equal.

Jesus89 commented 6 years ago

Closed by: https://github.com/CartoDB/renderer-prototype/pull/40

rochoa commented 6 years ago

@Jesus89, thanks a lot for sharing those details. BTW, keep your eyes open for the upcoming blog post: https://twitter.com/mourner/status/966098017723076608. 👀

CartoDB / carto-vl

Acceptance tests #39