ciemss / pyciemss

Causal and probabilistic reasoning with continuous time dynamical systems
Other
17 stars 6 forks source link

AssertionError: trajectories: PNG Histogram divergence: Shannon Jansen value 0.1562242136437859 > 0.04 #481

Closed sabinala closed 7 months ago

sabinala commented 7 months ago

Getting the following error message from a test failing in tests/visuals/test_schemas.py when running pytest.

============================================= FAILURES ==============================================
_______________________ test_export_PNG[schema_file2-ref_file2-trajectories] ________________________

schema_file = PosixPath('/Users/altu809/Projects/pyciemss/pyciemss/visuals/schemas/trajectories.vg.json')
ref_file = PosixPath('/Users/altu809/Projects/pyciemss/tests/visuals/reference_images/trajectories.png')
name = 'trajectories'

    @pytest.mark.parametrize("schema_file, ref_file, name", schemas(ref_ext="png"))
    def test_export_PNG(schema_file, ref_file, name):
        """
        Test all default schema files against the reference files for PNG files

        schema_file: default schema files saved within the visuals module
        ref_file: compare the created  png to this reference file
        name: stem name of reference file
        """
        with open(schema_file) as f:
            schema = json.load(f)

        image = plots.ipy_display(schema, format="PNG", dpi=72).data
        save_result(image, name, "png")

        test_threshold = 0.04
        JS_boolean, JS_score = png_matches(image, ref_file, test_threshold)
>       assert (
            JS_boolean
        ), f"{name}: PNG Histogram divergence: Shannon Jansen value {JS_score} > {test_threshold} "
E       AssertionError: trajectories: PNG Histogram divergence: Shannon Jansen value 0.1562242136437859 > 0.04 
E       assert False

tests/visuals/test_schemas.py:148: AssertionError
djinnome commented 7 months ago

This issue is caused by an overly aggressive test threshold that doesn't take into account different hardware choices in rendering images. Setting test_threshold = 0.2 should still catch anything egregious without failing on expected hardware variation. Thanks to @JosephCottam for explaining the issue.

JosephCottam commented 7 months ago

For more context, we tried to make a test that different machines were making the roughly the same PNG. But different OS/hardware pairings handle things like alpha composition and anti-aliasing differently so this ended up being vexingly hard. We thought we had it worked out but apparently not.

Upping the threshold will still catch RADICALLY different plots, so it will still ensure that the same general type of plot is produced. I think moving it to .2 is the right move for "make progress, catch bad things". We refine the definition of "make consistent plots" when we get further along.