Closed michaeltryby closed 6 years ago
I'd rather maintain it here if possible. On which network the build fails?
It is currently failing all three networks.
I guess my preference is to rollback.
My preference is to fix it here. I don’t consider the 3.x engine to be operational, and I have a pressing need for this feature in the 2.x engine. I’m thinking that there may be some platform differences also affecting the regression tests. I will investigate. I have a little time later this week.
We had a thread on testing - #16 (which is closed). There we discussed ways to compare results of new versions and I think that the way the tests are done now (using a fixed threshold for all properties of 1e-6) is not what we intended. The last build failed on travis-ci since the fifth decimal point was different.
@eladsal Consider familiarizing yourself with the new regression testing framework taking shape in feature-nrtest branch tools. The new regression testing framework has been designed to allow easy development of alternate comparison routines in part due to the discussion on Issue #16 that you are referring to.
For clarification, it is the build in the feature-nrtest branch that is currently failing because the WQ results are different.
Will do. It wasn't clear to me which tests are relevant for this thread.
I had to run my VB test code to find that the main reason the test are probably failing is the fact that MAXMSG is defined differently in the two versions. Up to the dev branch MAXMSG was 79 and in nrtest is is 255. This probably made the entire binary output file to shift. However, there are still some differences.
Funny but this what Brad was looking for :)
wow, it completely did not occur to me that MAXMSG would be assumed in the binary format... but of course it is.
Well it really shouldn't since messages are not written to the binary output but the same constant (MAXMSG) is used for the title strings .. we should change that and add MAXTITLE
@eladsal Thanks for looking into this. A few points of clarification. The variable ERR_MAXMSG
is defined in errormanager.h
as 256. The errormanager
is the only part of the code using this variable. The variable MAXMSG
is defined in messages.h
as 53 to reflect the actual maximum message length.
The outputapi is being compiled as a completely separate unit from EPANET so there is absolutely no way for these definitions of ERR_MAXMSG
or MAXMSG
to affect how the binary file is being written. Therefore, I do not agree with your assessment. This is not the reason why the build is currently failing. My best guess is that the build in the feature-nrtest
branch is failing because of changes made in quality.c
.
I am currently working on unit tests that establish the outputapi is working correctly. I will check them in when they are ready.
@michaeltryby - i think @eladsal was referring to how the binary file is written in the "2.2" development version, in output.c
. If the length of a string is written as 255, and read as 53 (or whatever) via the outputapi reader (which is different code), then a problem may occur.
@samhatchett yes, that is what I meant.
@michaeltryby it is my understanding that we are comparing the binary output created by the new code of EPANET to the binary output created by the trusted version of EPANET. If the two versions writes the title strings to the binary output in different sizes they must be read accordingly.
Regarding the quality.c
file, you are right. There was a bug there which I fixed over the weekend on feature-wrapper
(6e2a4e1). Also see 2fb20b4 for the MAXMSG
fix.
@eladsal I see what you mean now. Great catch!
Yes if the number of bytes in the title section changes that would affect the calculation of the output start position.
Expanded the test suite in PR #166. Current build is failing for a wider range of examples now. There are several possibilities that may be causing the tests to fail:
SWMM is currently building with a green badge on Appveyor. It would be a significant milestone for EPANET to have a green badge as well.
Hi,
I don't know why the tests are failing but I have done some checking.
The first network which fails is 2PRVs-aquis
, see here on appveyor. I compared the original 2.0.12 version with the one I compiled from the dev
branch. Apart for the version numbers the two rpt files are identical including the running times (my test script managed to run the two networks on the same second) - see attached files. Also, the values for all results are exactly the same. Compilation done with SDK 7.1 which is Visual Studio 2010 Service Pack 1 (16.00.40219).
As far as I can see, the worst error, for all networks, is at the third digit after the decimal point.
There are several possibilities that may be causing the tests to fail:
- Current benchmark version of EPANET 2.0.12 was last compiled using Visual Studio 6. Appveyor is configured to build OWA EPANET using Visual Studio 10 (VS 6 is not available on Appveyor).
I recompiled EPANET 2.0.12 using Visual Studio 10 2010 and created a set of benchmarks using it (located here).
We can use it to eliminate differences in the benchmarks due to VS6 --> VS10 compiler port and solely focus on finding and fixing bugs.
@michaeltryby have you compared the VS6 binary files to the VS10 files?
Yeah. 27 tests pass and 18 fail. Here are the results I got: receipt.txt
I have seen many cases in the report files that one show a value of 0.00 and the other shows -0.00. Do you consider this as a failure?
@eladsal Yes the way the report file comparison works that would be flagged as a failure.
When I compile the current OWA/dev branch and compare it to EPANET 2.0.12 both compiled with Visual Studio 10 2010 I got 28 tests pass and 17 tests fail. receipt.txt
I was hoping to see a bigger difference. :pensive:
@michaeltryby can you please post the output and report files for the 2PRVs-aquis network?
@eladsal for what version? I am juggling three different versions on my machine.
So you got 2012 on VS 6 (the original) and then 2012 and 2.1 on 10, right. I would start with the 2.1 version
I just updated epanet-example-networks release. It currently contains the following benchmarks:
I meant 2.2 and not 2.1.
Just checked the 3 versions of the report file for 2PRVs-aquis
using winmerge and these files are exactly the same a part for the run times and version numbers.
I see that there is a report summary being generated for this network, might that be the problem? Also, the footer is 3 lines since between the to run times lines there is a blank line.
Ok I will take a look.
@eladsal Looks like some files are failing because summary wasn't set to no. I am working on fixing it now. Thanks for catching that!
Eliminated some false positives, but there are still differences to investigate.
REF: epanet 2012 vs6 SOT: epanet 2012 vs10
32 tests passing and 13 tests failing
receipt.txt
At this point I think the majority of the differences can be attributed to the compiler changing from VS6 to VS10. I will investigate more tomorrow.
@michaeltryby I'm now looking at the 6848-68nodes
network. According to the receipt.txt file there are small differences in some values, in the range of 0.0001 (absolute). I'm getting these absolute differences (which are very small):
net.txt
However, as I understand it, the test which is running is using a relative diff value rtol=0.1
(the absolute tol is zero). This means that the diff between the test and base values should be larger than 10% of the base absolute value for the test to fail. Looking again at the values in the receipt.txt file (x and y arrays) I don' see these differences so I'm not sure why this is failing.
@eladsal good point! I will try and run that to ground.
@eladsal and @samhatchett finally got a chance to look at this. I have some good news! When I run REF: epanet 2012 vs10 SUT: epanet 220dev vs10 45 tests pass and 0 tests fail receipt.txt This means that 100% of the failing tests can be attributed to the VS6 to VS10 compiler port. I double checked my testing rig to confirm. So I'm reasonably certain that this is the case.
How would you like to proceed?
@michaeltryby and @samhatchett, if you were to skip VS2010 and port all the way to a newer version of VS, which compiler would you choose? This is going to be the same question on the SWMM side.
Does this also fix the Linux build regression?
@michaeltryby good news! It was failing before, right? What was the problem?
BTW, the receipt show that the ref version is 2.2.0dev and the test one is 2.0.12
@samhatchett No that does not fix the Linux build regression. @eladsal Yeah they are reversed but it doesn't make a difference. The result is the same either way.
@bemcdonnell great question. We should discuss and come to a consensus decision.
CI is running on this repo to help us maintain quality control. The build is currently failing! Maintaining the build in a passing state should be a high priority of the project.
Closer examination of the regression tests indicate that the water quality has changed relative to benchmark version 2.0.12. I believe this may be related to code modifications made for sequential hydraulic and water quality simulation - i.e. "LemonTiger".
Do you have any preferences on how this issue should be resolved? Sequential WQ simulation is already implemented in the epanet-dev repo. Should we roll back or fix and maintain it here?