MakieOrg / Makie.jl

Interactive data visualizations and plotting in Julia
https://docs.makie.org/stable
MIT License
2.37k stars 302 forks source link

Refimage test accuracy #1387

Open ffreyer opened 2 years ago

ffreyer commented 2 years ago

I noticed in #1382, specifically https://github.com/JuliaPlots/Makie.jl/runs/3884166657 that about half the refimage tests still pass when the output is just white. It would probably be good to decrease the accuracy threshold and maybe cut/rework some tests that can't reach high accuracy.

For reference, these should be all the refimage tests that passed when comparing to a blank white output in that test run:

0.0003 == text_266_latexsimple.png
0.0022 == text_289_latexupdates/step-1.png
0.0022 == text_310_updateannotationstyle/step-1.png
0.0031 == text_185_empty_lines.png
0.0033 == text_289_latexupdates/step-2.png
0.0033 == text_310_updateannotationstyle/step-2.png
0.0039 == text_277_latexbb.png
0.0087 == text_113_multi_boundingboxes.png
0.0088 == text_141_single_boundingboxes.png
0.0116 == text_80_single_strings_single_positions_justification.png
0.0125 == attributes_40_position.png
0.0129 == attributes_16_font.png
0.0129 == attributes_3_align.png
0.0131 == short_tests_22.png
0.0131 == short_tests_23.png
0.0142 == attributes_46_rotation.png
0.0144 == examples3d_422_UnicodeMarker.png
0.0147 == examples3d_128_Markersizes.png
0.0147 == text_32_single_strings_single_positions.png
0.0148 == text_55_multi_strings_multi_positions.png
0.0149 == short_tests_21.png
0.0150 == short_tests_56.png
0.0158 == examples3d_124_scatter.png
0.0171 == short_tests_49.png
0.0178 == attributes_22_glowcolor,glowwidth.png
0.0179 == short_tests_57.png
0.0183 == attributes_54_visible.png
0.0192 == text_19_dataspace.png
0.0201 == text_173_text_in_3d_axis.png
0.0208 == text_330_latexticks.png
0.0210 == examples2d_305_LineFunction.png
0.0223 == examples3d_132_RecordVideo.mp4
0.0237 == text_344_dynamiclatexticks.png
0.0291 == short_tests_20.png
0.0319 == short_tests_68.png
0.0320 == examples2d_142_TextAnnotation.png
0.0320 == text_252_latexstrings.png
0.0322 == examples3d_116_MeshscatterFunction.png
0.0363 == text_202_3Dscreenspaceannotations.png
0.0391 == short_tests_19.png
0.0398 == examples3d_290_Fluctuation3D.png
0.0413 == examples3d_342_ConnectedSphere.png
0.0442 == short_tests_67.png
0.0446 == examples3d_168_Contour3d.png
0.0451 == examples2d_199_Linechangingcolour.mp4
0.0455 == short_tests_25.png
0.0499 == short_tests_42.png
0.0502 == short_tests_51.png
0.0526 == short_tests_1.png
0.0555 == examples3d_239_FEMmesh3D.png
0.0572 == examples2d_120_scale_plot.png
0.0581 == examples3d_69_ColoredMesh.png
0.0585 == examples3d_60_TexturedMesh.png
0.0590 == examples2d_129_Polygons.png
ffreyer commented 2 years ago

Here are all the scores from that test run as well as results from previous runs that did not change anything visually. To summarize:

Maybe we could try cleaning up the worst offenders in WGLMakie and aim for 0.02 as a threshold? That would leave 28 cases in the list above. Mabye for those we can create some more noise in the refimages?

ffreyer commented 2 weeks ago

Need to check if this is still true

jkrumbiegel commented 2 weeks ago

Shouldn't, because I changed to a fixed tile based comparison at some point, and in these smaller tiles the sensitivity is higher.

The threshold is still higher than it would need to be if some noise weren't introduced by movie compression, I think the movie frames have the highest diff scores.

SimonDanisch commented 2 weeks ago

It's much better but @ffreyer we still had a PR not fail even though the colors of a meshscatter changed in WGLMakie... So maybe could still do with some tweaking?