Closed melisande-c closed 1 month ago
This is implemented with numpy.testing.assert_almost_equal, called in https://github.com/bioimage-io/core-bioimage-io-python/blob/9e1e1fe6c7a916411c644590b91ec57d1ed06c18/bioimageio/core/_resource_tests.py#L152
So the actual test is evaluating abs(desired-actual) < float64(1.5 * 10**(-decimal))
The argument name decimal
(taken from the numpy function used) can be confusing though, thanks for raising this issue!
The testing behavior should be documented!
TODO document decimal
argument in:
test_model
test_description
load_description_and_test
TestCmd.decimal
commands.test
test_decimal
in onnx convertersHi @FynnBe,
Thanks for the quick response, however this still has the problem that the tolerance in difference is not relative the the size of the values of the data. Say the output is 12345.6
and the expected output is 12345.7
, the difference is 0.1
and so would fail the test for decimals=1
, even though by a lot of metrics the values are pretty close.
These differences can arise for a few different reasons, but one situation that I have noticed small differences in outputs is using different batch sizes to process the predictions in PyTorch. I think the core tests the outputs with both a batch size of 1 & 2?
This problem has been encountered a few times when trying to export CAREamics models. There is a convenience function export_to_bmz
that allows users to export their models with limited arguments, and I am considering implementing the calculation of the decimal precision depending on the size of the input data values, but if this could be handled in the core
side that would be preferable.
Thanks, Melisande
oh of course you're right! yes, I suppose the cleanest would be to allow to specify an absolute and a relative tolerance like xarray implements it: https://docs.xarray.dev/en/stable/generated/xarray.testing.assert_allclose.html
I propose to
rtol
and atol
(or relative_tolerance
and absolute_tolerance
to be more explicit)decimal
decimal
to atol if atol is not specifiedxarra.testing.assert_allclose
This is maybe a good time to consider including these (atol and rtol) in the official spec (instead of having them in the config
field).
yes, I suppose the cleanest would be to allow to specify an absolute and a relative tolerance like xarray implements it: https://docs.xarray.dev/en/stable/generated/xarray.testing.assert_allclose.html
Just had a quick look and numpy also has testing.assert_allclose
with atol
and rtol
arguments: https://numpy.org/doc/stable/reference/generated/numpy.testing.assert_allclose.html
PR welcome ;-)
I would make a PR but do you need a deprecation warning for decimal
?
Yes, we cannot have exceptions raised for a suddenly missing decimal
argument.
a simple change of the default value to decimal: Optional[int] = None
and logging a warning a laif decimal is not None: logger.warning("deprecated in favor of atol")
will do.
Problem
Some datasets have large values, so when validating that an output matches the given expected output, it doesn't always make sense to use decimal precision as a metric. Say for example a dataset has values on the order
1e5
then the output could match the expected output to 5 significant figures but still fail the validation test, even if the decimals are checked only up to the first position past the decimal point.Solution
It would be desirable to be able to specify something like the number of significant figures or some kind of relative tolerance to validate the outputs with.