Possible bugs in regression testing using nrtest

LRossman commented 3 years ago

This project's CI testing uses the Python package nrtest to perform regression tests that compare results from a new build of the EPANET library against benchmark results for 51 different input files. If I understand the process correctly (which perhaps I don't), it uses two different types of comparisons on each individual pair of test results (new test value vs. benchmark reference value). The first is a closeness test which requires that:

abs(test_value - ref_value) <= atol + rtol * abs(ref_value)

while the second is a "correct decimal digits" test that requires:

-log10(abs(test_value - ref_value) / abs(ref_value)) >= atol

where atol and rtol are project-supplied constants.

I believe that the code for these tests is supplied to nrtest from the file __init__.py located in the project's \tools\nrtest-epanet\nrtest-epanet\ directory. However the CDD test (located in the epanet_mincdd_compare function) is missing the "/ abs(ref_value) " term from the definition of a CDD -- it's using the log of the absolute difference instead of the relative difference.

In addition, it seems that the batch command file run-nrtest.cmd that sets up and runsnrtest uses a single set of rtol and atol values, set to 0.01 and 0.0, respectively for both types of tests. If this is the case, then the "closeness" test would be based just on the relative tolerance and would cause comparisons between very small numbers that are actually close to one another to fail. For the CDD test, using a tolerance of 0 implies that all comparisons will pass since it requires that 0 digits be the same. However, since the incorrect CDD formula is being used (I think) a comparison between 123456.0 and 123457.1 would fail since the negative log of the absolute difference (-log10(1.1) = -0.04) is less than 0 even though the true CDD is 5.

I hope that someone from the community with more knowledge of the CI process than I have would check my claims to see if they are valid or not.

LRossman commented 3 years ago

As a part of PR #634 I modified the CI regression test protocol to address the concerns raised in this issue. That PR made a one-line addition for pressure dependent demand analysis that allowed a previously non-converging network to converge. However it failed under the original CI test protocol by producing a status report for a PDA benchmark network that converged in one fewer trials and having a difference of 1.6e-13 in the value of some output variable for another network.

The changes I made to the protocol are:

The absolute tolerance used to compare the closeness of test results to benchmark values was changed from 0 to 0.0001, thus avoiding test failures for differences between very small numbers.
The "correct decimal digits" test was dropped since it was being implemented incorrectly.
The check for identical status report contents was removed since it poses an impediment to accept code changes that produce more accurate solutions in fewer iterations.

These changes allowed PR #634 to pass. I believe they are reasonable and make it easier to accept algorithmic improvements to the code without sacrificing the ability of CI testing to identify changes that produce significant differences in results obtained for the suite of benchmark networks. If the community agrees with them I will proceed to merge PR #634 into the dev branch.

LRossman commented 2 years ago

The CI regression test changes recommended here have been merged into the devbranch through PR #634. I will keep this issue open to make visible what's been done in case there is a need (for some reason) to undo the changes.

OpenWaterAnalytics / EPANET

Possible bugs in regression testing using nrtest #635