Wrong solution found when PRV is downstream of a PSV

LRossman commented 2 years ago

@eladsal informed me of a paper by Olivier Piller et al. presented at last month's WDSA-CCWI conference that gave an example of EPANET 2.2 finding the wrong solution for the network shown below:

EPANET's solution had the PSV open and the PRV active while the correct solution is just the opposite. As pointed out in the paper, the problem stems from EPANET disconnecting the network at each valve when both are active at some point in the Newton-Raphson iterations causing nodes K0062 and K0063 to not have any connection to a fixed grade node. The resulting solution matrix is no longer of full rank which eventually leads EPANET to produce the incorrect solution.

The fix is to use a small quantity (-1e-8) for the off-diagonal coefficients in the solution matrix that connects the two nodes of a PRV or PSV while adding the same positive amount to the diagonal coefficients of these nodes. A new branch,dev-prv-psv-fix, has been created that implements the fix. It correctly solves the Piller example while producing the same results (or very close to the same) as the current v2.2 does for the seven networks in the regression test suite that contain PRVs or PSVs.

eladsal commented 2 years ago

Thank you, @LRossman. Do you think the change in the results for wolf-3.inp is neglectable? https://ci.appveyor.com/project/OpenWaterAnalytics/epanet/build/job/7xmdv31fm4bncsxt#L321

For reference, here is the paper by Olivier Piller et al. 14753.pdf

LRossman commented 2 years ago

Thanks @eladsal for checking the Appveyor tests (which I failed to realize now get run whenever a new commit is pushed to the remote). My local testing was just looking at the flow through the PRVs/PSVs.

So wolf-3.inp is kind of weird in that it has numerous branches that dead end to a node with 0 demand (looks like it's trying to model individual house connections). While the current devbranch solution is showing 0 gpm at these nodes thedev-prv-psv-fixcode is showing some small amount of flow (see excerpt of a comparison of pipe flows below):

 Pipe 7714  | 0 | Pipe 7714  | 0.0151
 Pipe 21015 | 0 | Pipe 21015 | 0.0151
 Pipe 21096 | 0 | Pipe 21096 | 0.0151
 Pipe 21181 | 0 | Pipe 21181 | -0.0151
 Pipe 21199 | 0 | Pipe 21199 | 0.0151

The only active PRV in this network is 4004 which appears at the reservoir pump outlet that supplies the bulk of the network flow. Since its flow difference with the benchmark is only 0.008% it also seems strange that this could cause such noticeable flow differences elsewhere. So I'm concluding that this commit is not ready for prime time yet and needs more work to make its results closer to the wolf-3.inp benchmark.

fmartine commented 2 years ago

The situation raised by O.Piller is not a rational proposal from an engineering point of view. Let Qmax be the flow rate transferred between the two tanks when both valves are open. If the PSV valve is activated at a set point higher than the pressure obtained with the two valves open, while the PRV is kept open, the flow rate will be reduced to a value Q1 < Qmax. Conversely, if the PSV is kept open and the PRV is activated at a pressure lower than the pressure obtained with the two valves open, the flow rate will be reduced to a value Q2 > Qmax. If we now activate the two valves, the PSV setpoint determines Q1 and the PRV setpoint determines Q2, and in general will be Q1 <> Q2, which prevents the solver from obtaining a coherent solution, since if there is no demand at the intermediate nodes it should be Q1 = Q2.

I have reproduced this case with different values for reservoir elevations, pipe properties and valve settings, and in most cases, the EPANET 2.2 Toolkit gives warning messages, and rightly so. A possible solution would be to activate only the VSP, set the flow rate to Q1 and open the PRV, but it is also possible to open the VSP, activate the PRV and set the flow rate to Q2. The solver cannot be asked to make a specific decision, and the most reasonable is to issue a warning message.

I think this problem starts from an incoherent proposal and out of practice solution from an engineering point of view. So in my view, it is not worth prolonging the investigation. In general, my experience tells me that when the solver fails to find a solution most of the time it is because the formulated problem is not feasible in the real world. An engineering robust solution (desirable) usually leads to stable results with the EN solver. I can provide many examples where a bad design or a bad formulation of the hydraulic scenario is the main cause of convergence problems.

However, the proposal made by O Piller et al. in their communication to the WDSA-CCWI Conference in order to analyze the existence of a unique solution when forcing flows and pressures in a PDA or DDA model is very interesting, and also a source of error for many of my students. I encourage them to continue the research on this topic so that someday the Toolkit could detect these inconsistencies before looking for an equilibrium solution.

LRossman commented 2 years ago

@fmartine the issue is that the EPANET 2.2 solution to the Piller example is clearly wrong (the PSV is open even though its upstream pressure of 55 m is below the 58 m setting) while the solution found by the modified code satisfies the PSV/PRV conditions (the PSV is active at 58 m and the PRV is open because its pressure is below the 35 m setting), all flows are equal, and the head loss across all pipes and valves sums to the difference between the elevations of the two reservoirs at each end. There is nothing inherently wrong with the problem formulation.

Coming at this from another angle, try running the example in v2.2 again, but this time place a TCV in parallel with each valve and give it a fixed status of CLOSED. EPANET will produce the correct solution (PSV active, PRV open) since the possibility of having disconnected nodes at the control valves has been eliminated.

I've always felt that a robust hydraulic solver should produce a mathematically correct solution (which might include negative pressures) no matter how crazy the network setup thrown at it. Over the years people have produced examples where EPANET had issues and we have responded by making code changes to address them. The changes made for low resistance pipes is one example. I think we need to keep improving the robustness of the solver so that users can continue to have confidence in the results it produces.

fmartine commented 2 years ago

It seems that the EPANET 2.2 result also depends also on the pipe properties. I have reconstructed the example in SI units, with diameter 200 mm for all elements and lengths 1000, 1000 and 500 m from R0 (60 m) to R3 (30 m) and the results are correct. The VSP valve is active at 58 m and the VRP valve is open because it cannot meet the 35 m set point.

Now I have opened the VSP, so that the VRP becomes active to maintain the setpoint of 35 m, and the head upstream of the VSP becomes 50 m. Next I have activated the VSP again with different values for the setpoint pressure. For 48 m it should stay open because it cannot meet the setpoint (now it commands the PRV), but the engine fails. The same for 50 m (the pressure obtained for the open VSP), and for 52 m. From 54 m onwards it responds correctly, the VSP imposes its set point (it commands the VSP) and the PRV opens because it cannot meet the set point.

Finally I have added a TCV in bypass with the PRV and PSV and closed them, at Lew's suggestion, and everything works perfectly. For a VSP setpoint below 50 m, it commands the PRV and the VSP opens, while above 50 m, it commands the VSP and the PRV opens. I think the new proposal is going in the right direction, only that the addition of the TCV valve in bypass must be replaced by some internal procedure that does the same effect.

Sorry for not having understood correctly the existence of a unique solution. With a setpoint for the PRV of 58 m, activating the PRV and opening the PSV is not a valid solution because the PSV can still throttle to raise the pressure from 50 m to 58 m. However, activating the VSP and opening the VRP is valid, as the VRP can do nothing to raise the resulting pressure from 31 m to 35 m.

LRossman commented 2 years ago

It turns out that only an Active PSV needs to have its source code modified to preserve connectivity downstream of it. For an Active PRV, since its downstream node is "replaced" with a fixed grade node at the valve's setting and the valve is unidirectional, there is no concern about nodes downstream of it becoming disconnected.

When the dev-prv-psv-fix branch was modified in this manner, the Piller example was still solved correctly and the new commit passed the Appveyor CI tests.

LRossman commented 1 year ago

This issue has been resolved with PR #694 and can be closed.

OpenWaterAnalytics / EPANET

Wrong solution found when PRV is downstream of a PSV #692