Convergence issues with networks in the latest dev branch

lbutler commented 1 year ago

On a comment in a LinkedIn post, Paul Boulos made the following comments:

You may want to use the USEPA official version of EPANET, which is version 2.2. I've tested the unofficial 2.3 version on an extensive database of actual water distribution networks varying in size from a couple thousand pipes to over 100,000 pipes and found that the unofficial version 2.3 diverged (i.e., did not converge) for many of those systems or produced different results. May be the reason why the USEPA has not yet released its official Version 2.3.

When I followed up, and asked if he shared the comments with the OWA-EPANET maintainers he replied:

The USEPA is aware of the serious bugs in the unofficial 2.3 version. I had a good discussion with the USEPA at the 2023 EWRI congress on the future of EPANET. Since my version is significantly faster, more stable and provides more features and capabilities we are evaluating the possibility of providing our version to the USEPA and all future updates for the greater benefit of the water industry. We agreed to further meetings and discussions. Meanwhile, I suggest you use the official version of EPANET (2.2).

Have any of the other contributors or maintainers, especially those with USEPA connections, aware of any models that fail to converge in the latest development branch of OWA-EPANET?

Do we want to look at modifying the existing CI/CD tooling around testing network results to allow us to privately run tests on our own database/folder of models?

I have access to hundreds of real models, and I'm sure many of the other contributors here do too, most of which can not be shared but it would be interested to confirm stable model results during development.

LRossman commented 1 year ago

The regression testing protocol in EPANET is similar to that of SWMM. A description of how the latter works can be found here (see the section starting with Regression Testing). The tools needed to run it are located in this project's toolsand src/outfile folders. It is based on the nrtest framework.

dickinsonre commented 1 year ago

Has the original commenter on LinkedIn supplied any models that did not converge in EPANET2.3? It may be may lack of understanding but SWMM seems to have more test models.

lbutler commented 1 year ago

Thanks @LRossman - my thought was we could modify, or create an alternative of before-test.sh/cmd that would generate the benchmark results from a private collection of network models. I'll start working on this and then can run it against the collection of models I have access to, we can then determine if there is any further action we need to take if we find any issues.

@dickinsonre - not yet, but I will ask Paul to share the models either her or privately with any of the contributors.

lbutler commented 1 year ago

I encountered several challenges while attempting to run the regression tests locally, I tired my Mac, a Windows machine, and a Windows server in a virtual machine. Although I managed to generate the output files (OUT) using the nrtest execute command, I could not run the comparison step with nrtest compare.

It seems to be from the dependency epanet_output, which is comes from the deprecated python SWIG wrapper. Python on all three machines said a compatible version was unavailable. I ended up using my own javascript code to read the output files and wrote a comparison function resembling numpy's "allclose" to match the results.

The entire process of setting up and understanding the code executed during regression testing proved to be extremely challenging.

It might be beneficial to refactor the codebase to allow anyone cloning the repository to run tests with minimal effort. We could use Docker - allowing the build, unit tests, and regression tests to be run within a containerized environment, removing the issues with dependencies. However, I don’t want to rush into adding another tool to the already complex toolchain.

During my testing, I ran the 51 existing models and noticed some minor differences, although they were not statistically significant. Potentially these surfaced because of my comparison function was not considering ‘rtol’ and ‘atol’ correctly, I will keep investigating.

In addition to the existing models, I also ran the first batch of 130 private models and found some significant differences in pressure values in a handful of models. I plan to narrow down sections of these models to affected sections and share them.

Almost all models converged successfully, except for a few instances where illegal parameters such as 0 DW roughness were present.

I will continue my investigation and share further updates.

LRossman commented 1 year ago

@lbutler you can refer to issue #635 to see the current values for atoland rtoland how they are used in the current regression testing protocol.

LRossman commented 1 year ago

Regarding the question of convergence rates between v2.2 and 2.3 (the current devbranch), I just ran a set of 30 networks from my collection with sizes (expressed as number of pipes) as follows:

500 - 1,000 : 3
1000 - 10,000 : 15
10,000 - 50,000 : 10
50,000 - 110,000 : 2

They all contained dozens (sometimes hundreds) of pumps and valves. The runs were made for just a single time period.

24 of 30 took exactly the same number of trials to converge. Two networks failed to converge with either version. The remaining 4 converged with slight differences in number of trials (38 v. 35, 63 v. 65, 35 v. 29, 69 v. 70). I think this presents strong evidence that v2.3 does not have convergence issues compared to v2.2.

lbutler commented 1 year ago

@LRossman I would agree that there is no evidence so far to suggest issues with convergence.

Ultimately the burden of proof should reside with the individual making the claim, but I have yet to receive anything from Paul. Because of the seriousness and public nature of the claim, I've still gone through the process to make sure there are no issues.

I think some interesting outcomes can still come from this investigation, such as making it easier to run the regression testing locally, both with the public and a private dataset and I can open up separate issues for those soon.

The one network where I did find different pressure results was for a long cross connection pipe between two tanks that was isolated from any supply, so it would not be providing realistic pressure either way. But I will still share those models and results shortly.

It's taking a bit of time, but I'm hoping in the next few days I can share what I've found and close off this issue if there are no true issues to report.

OpenWaterAnalytics / EPANET

Convergence issues with networks in the latest dev branch #736