Closed Jesus89 closed 6 years ago
Since this domain is hard to test, we must establish a QA process, this involves:
We can also try to automate this process using some screenshot testing utility.
I'm OK with jasmine
and karma
:rocket:
For the e2e/acceptance testing Exquisite
seems to be a good option but I would look for alternatives in order to be sure that there is nothing better.
We agreed on having an examples
folder with unitary examples for every use-case and use automated screenshot testing to compare reference images against new taken images.
Every machine is different, and there are significant changes between Linux and OSX making our tests fail, two proposals are in the table:
Having different references for every platform: This means having the screenshots for travis/CI as reference and if you want to run acceptance tests locally you have to use your own reference screenshots. Taking reference screenshots is a one-time process.
Increasing the tolerance of the img diff algorithm, this way small differences between platforms are ignored and we only need a single reference for all.
I suspect that we are going to need a threshold so high that the tests will become useless and will always pass. Thats my reason for that's the reason why I lean towards option 2.
cc @Jesus89 @davidmanzanares
I suspect that we are going to need a threshold so high that the tests will become useless and will always pass.
I don't think so.
If I was gonna do this, I would use ImageMagick comparison metrics instead of local references because it's easier to test on new machines and environments.
Since we've extensively discussed this in other channels, I'm not gonna block a PR with the local references approach, but I don't think it's the right approach.
@davidmanzanares I think we should run some tests with the threshold approach.
What metrics do you suggest?
@IagoLast PSNR should work well enough, but ImageMagick has many: https://www.imagemagick.org/script/command-line-options.php#metric
This is partially solved by: https://github.com/CartoDB/renderer-prototype/pull/51.
So let's rename this ticket to "Integration tests"
Having a threshold is OK. In a 256*256 image, a 0.3% error results in ~200 different pixels. We can live with that margin of error or even higher most of the time.
The tolerance we used for comparing PNG32 and PNG8 images in Windshaft was set to 8.5%. At the time, we used the simple FUZZ metric. You can check the differences in https://github.com/CartoDB/Windshaft/blob/master/test/results/compare.html.
reference | actual | result |
---|---|---|
🎉 | ||
🎉 | ||
🎉 |
reference | actual | result |
---|---|---|
👎 |
acceptance
foldercartogl
user.The current metric tool pixelmatch
detects the number of different pixels in an image using a threshold by default of 0.1. This threshold is used to compute maxDelta
that is the maximum acceptable square distance between two colors. Taking into account that he maximum possible value for the YIQ difference metric is 35215, this is the relation between threshold
and maxDelta
:
For each pixel position in both images a delta
value (squared YUV distance between colors) is computed. If this value is greater or equal than maxDelta
the pixels are different, otherwise they are equal.
@Jesus89, thanks a lot for sharing those details. BTW, keep your eyes open for the upcoming blog post: https://twitter.com/mourner/status/966098017723076608. 👀
Acceptance tests:
acceptance
foldercartogl
user.