Improving tests - Githubissues

AngusMcLure commented 10 months ago

The current tests are excellent in terms of code coverage but there are a few things that could be improved. I've suggested a list of improvements as a starting point. We can refine the list as we discuss:

[ ] Every exported function has at least one test for typical* parameter values
[ ] Every exported function with a form argument has a test, using typical* parameter values, for each of the primary forms: 'beta', 'logitnorm' and 'cloglognorm'
[ ] Every exported function has at least one test for 'extreme' values
[ ] Use normalised values for expect_equal() tests to avoid issues when the expected value is smaller than tolerance **

*Typical values: While we've discussed the technically allowable values of the arguments in detail, I'm not sure I've written out what the typical values would be. There will probably be cases that don't fit in these, but I would suggest that the following ranges should be considered typical for the purposes of making tests:

prevalence <0.1
correlation <0.2
specificity >0.99
sensitivity >0.9
pool_size <1/prevalence
pool_number < 10

Of these the most critical is probably prevalence. A prevalence of 0.001 or smaller even is not really an extreme value. On the other hand, if prevalence is more than about 10% (0.1) then it stops making sense to do pool testing and should be considered extreme. The larger the pools the lower the prevalence needs to be considered 'typical'. Hence the general rule pool_size <1/prevalence.

** expect_equal uses base::all.equal() as the backend. This has the slightly frustrating problem of using relative tolerance (fractional difference) when the second of the two numbers being compared is greater than tolerance, but uses absolute tolerance (absolute difference) when the numbers are less than the tolerance. So for instance these both throw errors

testthat::expect_equal(1+2e-6,1e-6,tolerance = 1e-6)
testthat::expect_equal(1e-5,1e-6,tolerance = 1e-6)

but this does no

testthat::expect_equal(1e-6,1e-7, tolerance = 1e-6)

Note that in the first example the two numbers differ by a tiny proportion (just over the tolerance of 1e-6 or 0.0001%), but in the second example the total difference is smaller in absolute terms, but the first number is 10 times bigger than the second number. However, in the third example you still have a 10-fold difference, but there's no error message. This is annoying behaviour!

One way to make sure that it always uses relative tolerance would be to divide the function call by the expected answer and then compare with 1. E.g. if the function is f(1) and we expect the result to be 2e-8 then we might use

testthat::expect_equal(f(1)/2e-8, 1, tolerance = 1e-6)

Now, I'm sure I'm not the first person to notice this problem with expect_equal, so it might also be worth digging around to see if there's another canonical way of doing this in testthat

fredjaya commented 9 months ago

@AngusMcLure A few questions about testing with relative tolerance, particularly with outputs that have multiple values (matrices and lists).

Is the mean relative difference mean(abs(actual - expected)) / mean(abs(expected)) a good way of catching precision errors in an e.g. matrix? Will this be effective in matrices with a wide range. For example a matrix output from fi_pool_cluster():

matrix(c(3793.83363, -39.6160580, -39.6160580, 0.6903514), nrow = 2)

Or should individual matrix/list elements be compared independently? I think this will require setting the tolerance dynamically.

(The tests in the latest commits I think are quite broken)

AngusMcLure commented 8 months ago

For matrices I think we should be checking each entry individually, but using the division trick to force it to use relative tolerance. I am happy for the same relative tolerance for each of the numbers in the matrix, so no need to set tolerance dynamically.

So in code this would be something like:

testthat::expect_equal(actual/expected, matrix(1, nrow = nrow(expected), ncol = ncol(expected)), tolerance = 1e-3)

AngusMcLure / PoolPoweR

Improving tests #27