JuliaTesting / ReferenceTests.jl

Utility package for comparing data against reference files
https://juliatesting.github.io/ReferenceTests.jl/latest/
Other
82 stars 14 forks source link

Clarification on psnr_equality(threshold) #74

Closed juliohm closed 1 year ago

juliohm commented 3 years ago

From the docstring it seems that the threshold t has a counter-intuitive definition. It is not the usual tolerance (a - b) <= t, but instead a measure of agreement (a - b) >= t?

The higher the threshold the more strict is the test? Could you please confirm?

johnnychen94 commented 3 years ago

Correct, this psnr_equality(t) serves as an alternative to ; the closer the two images are, the lower ||a - b|| is, and the higher the PSNR is.

juliohm commented 3 years ago

Perhaps you could consider flipping the definition? It would be more consistent with the Base approach (e.g. isapprox) where equality and thresholds have a standard meaning. The word threshold for me always refers to a form of tolerance in a comparison.

In either case, do you have a general rule of thumb for the actual value used in the threshold? I see that the default value is 25.

johnnychen94 commented 3 years ago

Perhaps you could consider flipping the definition?

Higher PSNR means lower tolerance, how could we flip the definition? We need a true to indicate a test pass here. If you need a custom reference equality test metric, it's very easy to define one, see ImageBinarization for an example.

In either case, do you have a general rule of thumb for the actual value used in the threshold? I see that the default value is 25.

No, I don't have one, it's chosen by experiences to replace the previous test_approx_eq_sigma_eps macro used in Images.jl

johnnychen94 commented 3 years ago

Feel free to open PR(s) to add more "equality" test functors by using metrics from Distances.jl if you're not satisfied with the current PSNR equality.

juliohm commented 3 years ago

Higher PSNR means lower tolerance, how could we flip the definition?

I think the issue is more about the naming? When I read psnr_equality(threshold) I immediately mix the notion of peak-signal-to-noise-ratio and equality, and consequently I think of threshold as something that is an upper bound, not a lower bound, makes sense?

Going even further, I think in most practical cases no one will want to customize the metrics of comparison assuming that someone in ReferenceTests.jl has thought about this issue carefully and will provide a good metric by default. I would expect a more intuitive option like:

@test_reference "foo.png" img tol=1e-2
juliohm commented 3 years ago

Would it be reasonable to add an option tol besides by with a canonical threshold semantic? I dont usually need to customize the metric, but I need to customize the tolerance of the test.

johnnychen94 commented 3 years ago

A missing euclidean_equality(threshold) is all you need, I guess.

johnnychen94 commented 3 years ago

There's a plan to deprecate the current API @test_reference in favor of match_referece which returns a Bool value.

- @test_reference "foo.png" img
+ @test match_reference("foo.png", img)

so I don't want to add more keywords to the current API. I will definitely consider a better default equality option for image types when it's in progress.

juliohm commented 3 years ago

I think the issue is more profound. It is about considering a consistent and simple-to-use approach to comparing two objects according to a tolerance. Somewhat a generalization of isapprox where people in ReferenceTests.jl spent time figuring out a good default for the behavior of each type. If PSNR is a good default for images or Euclidean distance, that is fine. Do you think we can always express the comparison in terms of (a-b) < t ? For example, in the PSNR case I would simply flip the sign of the inequality by multiplying the result of PSNR by minus one (a-b) >= t -----> -(a-b) <= -t This way both Euclidean and PSNR follow the same notion of "tolerance" as an upper bound.

juliohm commented 3 years ago

I think the point of this issue is to say that the threshold may confuse new users of the package. I got confused coming from VisualRegressionTests.jl for example.

johnnychen94 commented 3 years ago

I'm not standing in the way of adding an euclidean_equality or whatever equality to replace the default psnr_equality here for image types, as long as it makes sense.

Unlike VisualRegressionTests.jl, ReferenceTests.jl does not provide a strong assumption on the input type and reference type, and this is why we expose the by keyword instead of tol. This package is designed for package developers so flexibility and faithfulness are more important than convenience. Said that, I don't think @test_reference "foo.png" img tol=1e-2 is more convenient than @test_reference "foo.png" img by=euclidean_equality(1e-2), except that it's two words less.

I'll update the docstring of psnr_equality to clarify the meaning of tolerance. (#75)

juliohm commented 1 year ago

Thank you for clarifying all issues @johnnychen94. Closing the discussion.