Open epag opened 2 months ago
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-10T19:55:55Z
To this end I am generating some system test coverage reports using jacoco.
I can't see a straightforward way to automatically generate "which code does test A cover that test B does not cover".
But I can at least have 1 report for each system test, I can merge all the system test coverages together to see which code is not covered by any system test. The latter may be useful information.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-10T20:29:10Z
The command used inside my wres.sh to ensure the exec file is generated is:
JAVA_OPTS='-Dwres.logLevel=$WRES_LOG_LEVEL -Dwres.url=nwcal-rdb.[host] -Dwres.databaseName=wres3 -Dwres.username=jesse.bickle -Dwres.password=CapitalismRocks -javaagent:/home/jesse/code/jacoco-0.8.1/lib/jacocoagent.jar=includes=wres*,append=false' $HOME/code/wres/build/install/wres/bin/wres $*
The guide for options on the agent is here: https://www.eclemma.org/jacoco/trunk/doc/agent.html
We use append=false so as to create a new coverage file per run, and use includes=wres* to not bother instrumenting non-wres code.
Then to take the exec files and the source files and class files, generate a report, we should be able to use another gradle task (but I couldn't quickly get that working) or the newly-available jacoco cli.
Here's cli command, looping over the system test directories and generating a report for each one that has a jacoco.exec file in it:
for dir in scenario*; do if [[ -f $dir/jacoco.exec ]]; then java -jar /home/jesse/code/jacoco-0.8.1/lib/jacococli.jar report $dir/jacoco.exec --classfiles ../build/classes/java/main --classfiles ../wres-config/build/classes/java/main --classfiles ../wres-datamodel/build/classes/java/main --classfiles ../wres-grid/build/classes/java/main --classfiles ../wres-io/build/classes/java/main --classfiles ../wres-messages/build/classes/java/main --classfiles ../wres-metrics/build/classes/java/main --classfiles ../wres-system/build/classes/java/main --classfiles ../wres-tasker/build/classes/java/main --classfiles ../wres-util/build/classes/java/main --classfiles ../wres-vis/build/classes/java/main --classfiles ../wres-worker/build/classes/java/main --sourcefiles ../src --sourcefiles ../wres-config/src --sourcefiles ../wres-datamodel/src --sourcefiles ../wres-grid/src --sourcefiles ../wres-io/src --sourcefiles ../wres-messages/src --sourcefiles ../wres-metrics/src --sourcefiles ../wres-system/src --sourcefiles ../wres-tasker/src --sourcefiles ../wres-util/src --sourcefiles ../wres-vis/src --sourcefiles ../wres-worker/src --html ../build/reports/jacoco${dir} --csv ../build/reports/jacoco${dir}.csv ; fi; done
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-10T20:49:11Z
Merging scenario0 together into one report:
find scenario0
-name "jacoco.exec" -exec java -jar /home/jesse/code/jacoco-0.8.1/lib/jacococli.jar merge --destfile jacoco_scenario0xx.exec "{}" +
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-10T21:04:10Z
Example merged report of scenario0* attached.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-10T21:30:02Z
Some of the lack of coverage in this report is due to running against an already-populated database, e.g wres.io.reading has close to 0 coverage.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-10T22:02:11Z
Investigating ways to evaluate relative contributions of tests.
First stab at it was to use the csv output, grep out the "0" coverage rows from the scenario0* and scenario0xx combined reports, to ignore those, then work on comparing the values between files. But noticed that sorting by size right after filtering out 0 coverage rows yields results. They all started at 45,9xx bytes, but after removing 0 coverage lines and sorting the files by size, this gives a quick-and-dirty estimate of which tests are contributing more and less (always with reference to the combined coverage of that set of tests, which is scenario0xx in this case)
Before filtering:
-rw-rw-r--. 1 jesse jesse 45945 Aug 10 15:44 jacoco_scenario001.csv
-rw-rw-r--. 1 jesse jesse 45953 Aug 10 15:44 jacoco_scenario003.csv
-rw-rw-r--. 1 jesse jesse 45933 Aug 10 15:44 jacoco_scenario007.csv
-rw-rw-r--. 1 jesse jesse 45927 Aug 10 15:44 jacoco_scenario008.csv
-rw-rw-r--. 1 jesse jesse 45927 Aug 10 15:44 jacoco_scenario009.csv
-rw-rw-r--. 1 jesse jesse 45949 Aug 10 15:44 jacoco_scenario010.csv
-rw-rw-r--. 1 jesse jesse 45934 Aug 10 15:44 jacoco_scenario012.csv
-rw-rw-r--. 1 jesse jesse 45918 Aug 10 15:44 jacoco_scenario015.csv
-rw-rw-r--. 1 jesse jesse 45933 Aug 10 15:44 jacoco_scenario016.csv
-rw-rw-r--. 1 jesse jesse 45948 Aug 10 15:44 jacoco_scenario017.csv
-rw-rw-r--. 1 jesse jesse 45931 Aug 10 15:44 jacoco_scenario018.csv
-rw-rw-r--. 1 jesse jesse 45936 Aug 10 15:44 jacoco_scenario019.csv
-rw-rw-r--. 1 jesse jesse 45980 Aug 10 15:53 jacoco_scenario0xx.csv
Header of the file:
GROUP,PACKAGE,CLASS,INSTRUCTION_MISSED,INSTRUCTION_COVERED,BRANCH_MISSED,BRANCH_COVERED,LINE_MISSED,LINE_COVERED,COMPLEXITY_MISSED,COMPLEXITY_COVERED,METHOD_MISSED,METHOD_COVERED
And therefore filtering command:
for f in jacoco_scenario.csv; do grep -v "[0-9],0,[0-9],0,[0-9],0,[0-9],0,[0-9],0" $f > ${f}only_with_coverage.csv ; done
Sorting the results by size:
$ ls -larS *only_with_coverage*csv
-rw-rw-r--. 1 jesse jesse 16125 Aug 10 16:53 jacoco_scenario015.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 16525 Aug 10 16:53 jacoco_scenario009.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 16529 Aug 10 16:53 jacoco_scenario008.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 17221 Aug 10 16:53 jacoco_scenario012.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 17255 Aug 10 16:53 jacoco_scenario019.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 17259 Aug 10 16:53 jacoco_scenario018.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 17470 Aug 10 16:53 jacoco_scenario016.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 17789 Aug 10 16:53 jacoco_scenario007.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 18865 Aug 10 16:53 jacoco_scenario010.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 18947 Aug 10 16:53 jacoco_scenario001.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 19205 Aug 10 16:53 jacoco_scenario017.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 22284 Aug 10 16:53 jacoco_scenario003.csvonly_with_coverage.csv
-rw-rw-r--. 1 jesse jesse 23558 Aug 10 16:53 jacoco_scenario0xx.csvonly_with_coverage.csv
Right off the bat there is an ordering of contributions to test coverage relative to the last one. Appears that scenario003 contributes a lot. The question then is "which tests contribute the difference between scenario003 and the merged amount?"
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-10T22:05:24Z
And it's not so simple as counting rows with non-zero test coverage. Scenario003 gives 31% Instructions coverage and 23% branch coverage, while the merged-all-of-scenario0xx gives 38% instructions coverage and 30% branch coverage.
Original Redmine Comment Author Name: James (James) Original Date: 2018-08-10T22:06:56Z
Nice. Not at all surprised to see you mention scenario003, as that does generate a ton of different outputs. It's the first one I run after unit testing FWIW. I would expect tests that are orthogonal to 003 to include 801 and 1000 and whatever test does gridded outputs (although they may not touch a lot of code like 003).
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-10T22:07:20Z
Each row in the csv is a class.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-10T22:58:57Z
Which classes are not in scenario003 that got coverage elsewhere?
diff --suppress-common-lines -y <(cut -d',' -f2-3 jacoco_scenario003.csvonly_with_coverage.csv | sort) <(cut -d',' -f2-3 jacoco_scenario0xx.csvonly_with_coverage.csv | sort)
> wres.config.generated,DateCondition
> wres.config.generated,DoubleBoundsType
> wres.config.generated,EnsembleCondition
> wres.config.generated,GraphicalType.Config
> wres.config.generated,PoolingWindowConfig
> wres.engine.statistics.metric.processing,MetricProcessorByTim
> wres.io.data.caching,Ensembles
> wres.io.data.details,EnsembleDetails
> wres.io.data.details,EnsembleDetails.EnsembleKey
> wres.io.retrieval,ClimatologyBuilder
> wres.io.retrieval,ClimatologyBuilder.DateRange
> wres.io.retrieval,ClimatologyBuilder.PrepareDateLoad
> wres.io.retrieval,PoolingMetricInputIterator
> wres.io.retrieval.scripting,PoolingForecastScripter
> wres.vis,ScoreOutputByLeadAndThresholdXYDataset
> wres.vis,XYChartDataSourceFactory.new DefaultXYChartDataSourc
> wres.vis,XYChartDataSourceFactory.new DefaultXYChartDataSourc
Which scenario0* are most different from one another in terms of classes with non-zero coverage?
for f in [0-9][0-9][0-9].csvonly_with_coverage.csv; do for g in [0-9][0-9][0-9].csvonly_with_coverage.csv; do diff --suppress-common-lines -y <(cut -d',' -f2-3 $f | sort) <(cut -d',' -f2-3 $g | sort) | wc -l | xargs echo "diff of $f to $g "; done ; done | sort -k 6 -n
Spans from 1 class-with-coverage difference between 001 and 010 all the way to 81 class-with-coverage difference between 003 and 015.
Least different by class-with-coverage from one another among scenario0*: 001 and 010 (1) 008 and 009 (2) 001 and 017 (3) 010 and 017 (4) 018 and 019 (6) 016 and 018 (7) 009 and 018 (8) 008 and 018 (10) (next difference is 14)
Most different: 015 and 003 (81) 009 and 003 (72) 008 and 003 (72) 012 and 003 (69) 019 and 003 (67) 018 and 003 (64) 016 and 003 (61) 007 and 003 (58) 017 and 015 (48) 010 and 015 (48) 001 and 015 (47) 017 and 003 (44) 012 and 016 (42) 010 and 003 (42)
So if we kept 003, 015, and 010, would we end up with close to the same combined coverage as all of scenario0*?
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-13T12:35:24Z
After thinking more clearly about it since Friday, coverage is only one angle from which tests are valuable. If the test is testing something different, or testing a different feature, and has a good assertion, it is still valuable.
Still interested in "what code is not touched by any system test" which I can do after completing all tests. Scenario750 is where the devtests are stuck for me.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-15T19:46:40Z
I have run these scenarios and combined their coverage into a jacoco report: scenario001 scenario003 scenario007 scenario008 scenario009 scenario010 scenario012 scenario015 scenario016 scenario017 scenario018 scenario019 scenario1000 scenario1001 scenario100 scenario101 scenario102 scenario103 scenario104 scenario105 scenario106 scenario107 scenario200 scenario300 scenario301 scenario302 scenario303 scenario304 scenario305 scenario400 scenario401 scenario402 scenario403 scenario404 scenario405 scenario407 scenario408 scenario409 scenario500 scenario501 scenario502 scenario504 scenario506 scenario600 scenario601 scenario700 scenario701 scenario702 scenario704 scenario705 scenario706
Attached both as csv and the html report (in the tar.gz)
Assuming I set everything up correctly, instruction coverage of system tests is 56%, branch coverage of system tests is 44%, you can see what's not covered by examining the html report.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-15T19:55:38Z
Combined coverage of the junit tests and system tests scenario001 scenario003 scenario007 scenario008 scenario009 scenario010 scenario012 scenario015 scenario016 scenario017 scenario018 scenario019 scenario1000 scenario1001 scenario100 scenario101 scenario102 scenario103 scenario104 scenario105 scenario106 scenario107 scenario200 scenario300 scenario301 scenario302 scenario303 scenario304 scenario305 scenario400 scenario401 scenario402 scenario403 scenario404 scenario405 scenario407 scenario408 scenario409 scenario500 scenario501 scenario502 scenario504 scenario506 scenario600 scenario601 scenario700 scenario701 scenario702 scenario704 scenario705 scenario706 yields 63% instructions coverage and 53% branch coverage.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-15T20:28:47Z
(To answer Friday's question "So if we kept 003, 015, and 010, would we end up with close to the same combined coverage as all of scenario0*?": kind of but not really, 34%/27%, and it doesn't matter if each test is testing an important thing.)
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-08-15T21:42:21Z
scenarios ordered by instruction coverage percentage:
build/reports/jacoco_scenario502.csv: instructions=0.2197933421065458 branches=0.15121241513094083
build/reports/jacoco_scenario506.csv: instructions=0.22636512810514858 branches=0.15577109602327838
build/reports/jacoco_scenario302.csv: instructions=0.22845282773634873 branches=0.15518913676042678
build/reports/jacoco_scenario015.csv: instructions=0.22891585839385226 branches=0.1597478176527643
build/reports/jacoco_scenario301.csv: instructions=0.2335949050380985 branches=0.15994180407371483
build/reports/jacoco_scenario601.csv: instructions=0.2383714318207665 branches=0.1646944713870029
build/reports/jacoco_scenario008.csv: instructions=0.23885070916800702 branches=0.16896217264791463
build/reports/jacoco_scenario408.csv: instructions=0.2411089990414453 branches=0.1676042677012609
build/reports/jacoco_scenario405.csv: instructions=0.2440740199184416 branches=0.16964112512124152
build/reports/jacoco_scenario012.csv: instructions=0.24506506799239655 branches=0.1681862269641125
build/reports/jacoco_scenario706.csv: instructions=0.24586927913437637 branches=0.16333656644034916
build/reports/jacoco_scenario101.csv: instructions=0.24687657389806827 branches=0.1726479146459748
build/reports/jacoco_scenario104.csv: instructions=0.24783512859254928 branches=0.16857419980601357
build/reports/jacoco_scenario200.csv: instructions=0.24794073207583955 branches=0.18292919495635304
build/reports/jacoco_scenario106.csv: instructions=0.25017465191467236 branches=0.17070805043646944
build/reports/jacoco_scenario018.csv: instructions=0.2512306867475752 branches=0.17245392822502426
build/reports/jacoco_scenario407.csv: instructions=0.2522136114766616 branches=0.17710960232783704
build/reports/jacoco_scenario501.csv: instructions=0.2531802895160111 branches=0.17507274490785646
build/reports/jacoco_scenario019.csv: instructions=0.2535133466556189 branches=0.17536372453928226
build/reports/jacoco_scenario016.csv: instructions=0.2538545271400952 branches=0.18060135790494666
build/reports/jacoco_scenario701.csv: instructions=0.25484557521405016 branches=0.17740058195926287
build/reports/jacoco_scenario305.csv: instructions=0.2554873194586603 branches=0.17032007759456838
build/reports/jacoco_scenario007.csv: instructions=0.25565790970089847 branches=0.17895247332686712
build/reports/jacoco_scenario500.csv: instructions=0.2579730629884161 branches=0.17672162948593598
build/reports/jacoco_scenario402.csv: instructions=0.25932153823658427 branches=0.18331716779825413
build/reports/jacoco_scenario704.csv: instructions=0.26048317655277736 branches=0.17410281280310377
build/reports/jacoco_scenario705.csv: instructions=0.26275771311595264 branches=0.178176527643065
build/reports/jacoco_scenario304.csv: instructions=0.2651297298175497 branches=0.1950533462657614
build/reports/jacoco_scenario105.csv: instructions=0.2662751214440058 branches=0.19146459747817654
build/reports/jacoco_scenario303.csv: instructions=0.2663644782375591 branches=0.19165858389912707
build/reports/jacoco_scenario700.csv: instructions=0.2678510503484915 branches=0.18186226964112512
build/reports/jacoco_scenario403.csv: instructions=0.2694675959773196 branches=0.19873908826382153
build/reports/jacoco_scenario404.csv: instructions=0.2712791018829913 branches=0.19437439379243454
build/reports/jacoco_scenario504.csv: instructions=0.27314747120274246 branches=0.1930164888457808
build/reports/jacoco_scenario401.csv: instructions=0.27352926841156117 branches=0.1950533462657614
build/reports/jacoco_scenario702.csv: instructions=0.27566570811197216 branches=0.1834141610087294
build/reports/jacoco_scenario409.csv: instructions=0.2782326850904128 branches=0.19224054316197867
build/reports/jacoco_scenario017.csv: instructions=0.2787688258517327 branches=0.19262851600387973
build/reports/jacoco_scenario600.csv: instructions=0.2801091777550324 branches=0.18671193016488846
build/reports/jacoco_scenario009.csv: instructions=0.2811814592776722 branches=0.1935984481086324
build/reports/jacoco_scenario010.csv: instructions=0.2964371009406833 branches=0.2128031037827352
build/reports/jacoco_scenario107.csv: instructions=0.29661581452778996 branches=0.21454898157129002
build/reports/jacoco_scenario003.csv: instructions=0.2991746681613621 branches=0.20591658583899128
build/reports/jacoco_scenario103.csv: instructions=0.3009942974119023 branches=0.2140640155189137
build/reports/jacoco_scenario001.csv: instructions=0.31038488407986875 branches=0.21716779825412222
build/reports/jacoco_scenario300.csv: instructions=0.32296794528114897 branches=0.22337536372453928
build/reports/jacoco_scenario102.csv: instructions=0.3230004386606229 branches=0.227643064985451
build/reports/jacoco_scenario400.csv: instructions=0.33313837305648974 branches=0.23656644034917557
build/reports/jacoco_scenario100.csv: instructions=0.3532842683303277 branches=0.24645974781765276
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2018-09-12T17:52:42Z
Updating the list of items in the Description. I'll update the associated tickets later.
I believe the path we have discussed for redesigning the system tests is described as the "Alternative" for reducing the size of the data.
Thanks,
Hank
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2018-09-18T17:57:06Z
All:
Thinking about the general design of the system tests, I have the following proposals:
Functionality related to pairing and conditioning (including rescaling, restricting data by issued or valid dates, restricting data by value, pooling windows, etc.) and calculating metrics (including thresholds) will be tested using manufactured, small data sets, similar to those in the 500 series.
For handling of data formats, there will be one system test per data format (PI-XML, NetCDF, datacard, etc.) and data category (observations, simulations, single-valued forecasts, and ensemble forecasts). Such tests will be very basic, focusing only on ability to ingest data and confirming the pairs. The idea is that, once the data is in the database, it all appears the same from the point of view of WRES. Thus, there is no need to test the functionality on a per data format basis. Hence, keep the input data format testing as simple as possible while focusing functionality testing on the smallest data set.
For gridded evaluation, we will maintain two data sets on the WRES Test VM: (1) the original data from NWM or whatever source; (2) a "masked" proxy data small enough to share with developers via Google drive. On the WRES Test VM, we will test against the original data by default, with symbolic links allowing for switching between the original and masked data sets.
This will in general mean fewer tests outside of the 500 series, more 500 series tests, and a significantly quicker system test run time.
Does this seem like a reasonable approach to redesigning the system tests? Thanks,
Hank
Original Redmine Comment Author Name: Chris (Chris) Original Date: 2018-09-18T18:22:46Z
I'm in favor.
Alex has been using the CSV reader, so that's looking fairly solid. It's only for forecasts at the moment, but setting up sample data through csv rather than the PIXML sources and such might make life easier. As for observations, I'd say that it might take ~4hrs (writing the observation copy mechanism and copying and modifying the forecast logic to replace the current exception) to get that up and running.
Original Redmine Comment Author Name: James (James) Original Date: 2018-09-18T18:47:25Z
Hank,
Taking a step back, I think the system tests are intended to be our very top level of integration testing where we test all components of the system together, including any UIs once those are implemented.
Additionally, they are focused on software function and, therefore, we assert correctness. They are not about non-functional tests, such as performance etc.
In that context, I agree with (1). The main thing about (1) is that each test should add something new and that all tests collectively should cover as much of the overall functionality as possible.
However, (2) sounds like unit testing to me, specifically unit tests that are functional tests. In that sense, I don't agree with (2), because it involves a specific software unit. Additionally, I don't agree with the assertion of correctness being about pairs, because ingesting formats is not about pairing. What those tests assert will depend on how the relevant software units are written and what they are intended to do (I haven't checked), but it might include something prior to the database or it might include data that has reached the database, but not pairs.
I think I agree with (3), although this isn't another category of test, it's more about expediency.
Also, since I mentioned UIs, I'm not sure about how much of the system testing will include testing the service/UI/usability testing or whether we break that out. Either way, I think we'll want to use existing frameworks for that, like Selenium.
James
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2018-09-18T20:13:09Z
James,
Not worrying about UIs, yet, but I have the same questions long term (how to integrate testing of UIs).
Back to the numbers... (3) is not about design, so I probably should have separated that a bit. Sorry for the lack of clarity.
Your comments on (2)...
I mentioned the pairs only because checking the pairs is the only conduit I have for checking that the data was properly ingested as part of system testing. The point is we need to confirm that the reader for the data format is appropriately inserting all necessary information into the database. I suppose that could be covered by a good unit test if such a test can check all the inserts and whatnot created by a reader for correctness. Alternatively, it could be covered by integration testing (which we also typically do through JUnit) by checking the contents of the database after the data is ingested. Regardless, I think I can agree that (2) sounds like unit testing, from a design point of view.
So, I guess from a design perspective, we can focus on (1).
However, until we have the necessary framework in place to test the data format handling via JUnit (unit or integration testing), should I continue to include it as part of system testing?
Thanks,
Hank
Original Redmine Comment Author Name: James (James) Original Date: 2018-09-18T20:53:42Z
Right, I'm not sure about UIs. What I do know is that other people have thought about that and there are frameworks for automated testing various types of usability. Anyway, yes, let's focus on (1), i.e. functional testing at our highest level of integration, namely the system/service.
I wouldn't introduce any shortcuts to test formats. I don't believe there's a major rush to get these system tests refactored and we don't need to refactor all of them at once. Also, much of what you want to do on (1) is likely to indirectly test the correct reading of various shapes of data, if not formats, only we'd be making assertions about the statistical outputs, rather than pairs.
On (2), my vote would be to focus the tests of input formats on the software units responsible for reading specific formats and to ensure all assertions have an appropriate scope, i.e. let's not assert something about a side-effect that has a different scope, such as pairs. IIRC, the source readers are responsible for both reading data and copying/saving it to the database. I believe we had a discussion about separating the two at some point. Either way, the unit tests should have whatever scope the software units have - I can imagine that some of these source readers have behavior that we can test without testing database interaction, but some elements will require that we test database interaction, which we can do more easily now.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-10-01T14:37:46Z
Anything here: https://martinfowler.com/tags/testing.html
I think the most urgent article for us at the moment is https://martinfowler.com/articles/nonDeterminism.html
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-10-03T21:18:51Z
Neither system tests nor unit tests covered compareTo in FeatureDetails. Not that coverage would guarantee anything (especially if the test did not have hand-crafted assertions), but one or both might help. Thinking of #55781. I think the general situation is "more than one feature, specified by one of the non-locationId fields." However, testing may not be the answer for #55781 either, may be something else.
Original Redmine Comment Author Name: Chris (Chris) Original Date: 2018-10-04T13:52:50Z
The issue was the logic in the FeatureKey compareTo (not with FeatureDetails) and would have still been somewhat hard to find with a unit test. The issue was that the primary key in the Features cache was the lid, but we were looking for the the proper one with the partial key of the comid (which is what it was built for). Since the ordering of the FeatureKeys were based on lid, then comid, it was faltering because it was traversing the treemap based on that comid where the data within was already properly ordered because all possible fields were populated. Because of how it was populated by lid, the comids weren't ordered at all. When it went to load, it sensed that the first key in the tree was larger than the one being checked, tried to look for the key prior, and failed because there were no keys in the cache lower than that first encountered key. The matching value was in the cache, but the normal ordering was reversed due to the ordering based on LID.
We should still have those unit tests, but I don't think they would have had a high chance of highlighting this issue.
I'm sorry if that was a convoluted explanation; I'm still waking up.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-10-04T14:01:22Z
The file hank modified was called FeatureDetails.java, I should have been more specific.
Chris, your in-the-weeds analysis sounds fine to me. My analysis is simple: it didn't work because we didn't test it. Whether that's manual, unit, or system test doesn't matter. If we or the test software aren't regularly exercising the functionality, it can and will break. So I don't think that conflicts with your analysis.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-05-11T16:27:27Z
Adding jacoco plugin to the system tests.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-05-11T16:30:23Z
Error when doing @$ ./gradlew --no-daemon cleanTest test jacocoTestReport -PwresZipDirectory=../../../../build/distributions -PwresGraphicsZipDirectory=../../../../wres-vis/build/distributions -PversionToTest=20210510-dc74b7d-dev -PgraphicsVersionToTest=20210505-13953f0-dev -PtestJvmSystemProperties="-Dwres.systemTestSeed=235 -Dwres.useSSL=false -Dwres.databaseHost=localhost -Dwres.databaseName=wres -Dwres.username=wres_user -Djava.awt.headless=true -Dwres.dataDirectory=C:/wres_system_test_data -Djava.io.tmpdir=cygpath -aw .
-Ducar.unidata.io.http.maxReadCacheSize=200000 -Ducar.unidata.io.http.httpBufferSize=200000 -Dwres.attemptToMigrate=false -Duser.timezone=UTC" --tests=Scenario001 --tests=Scenario003 --tests=Scenario009 --tests=Scenario102 --tests=Scenario105 --tests=Scenario107 --tests=Scenario300 --tests=Scenario400 --tests=Scenario801@:
FAILURE: Build failed with an exception.
* What went wrong:
Problem configuring task :jacocoTestReport from command line.
> Unknown command-line option '--tests'.
But when doing it like the following (move the jacocoTestReport task to after all the --tests arguments), it proceeds: @$ ./gradlew --no-daemon cleanTest test -PwresZipDirectory=../../../../build/distributions -PwresGraphicsZipDirectory=../../../../wres-vis/build/distributions -PversionToTest=20210510-dc74b7d-dev -PgraphicsVersionToTest=20210505-13953f0-dev -PtestJvmSystemProperties="-Dwres.systemTestSeed=235 -Dwres.useSSL=false -Dwres.databaseHost=localhost -Dwres.databaseName=wres -Dwres.username=wres_user -Djava.awt.headless=true -Dwres.dataDirectory=C:/wres_system_test_data -Djava.io.tmpdir=cygpath -aw .
-Ducar.unidata.io.http.maxReadCacheSize=200000 -Ducar.unidata.io.http.httpBufferSize=200000 -Dwres.attemptToMigrate=false -Duser.timezone=UTC" --tests=Scenario001 --tests=Scenario003 --tests=Scenario009 --tests=Scenario102 --tests=Scenario105 --tests=Scenario107 --tests=Scenario300 --tests=Scenario400 --tests=Scenario801 jacocoTestReport@
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-05-11T19:34:17Z
It was easy to get the test.exec to have wres coverage information, but getting it into the report is another story altogether.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-05-11T20:36:09Z
At the bottom of the innermost @build.gradle@ here is what I have so far:
test
{
// ...
jacoco
{
includes = [ "wres.*" ]
}
}
jacocoTestReport
{
reports
{
additionalSourceDirs.setFrom( fileTree( dir: buildDir,
includes: [ "wres-*/src/*.jar" ] ) )
xml.enabled false
csv.enabled true
html.enabled true
}
doLast { println( additionalSourceDirs.asList() ) }
}
</code>
The jacoco block in the test block is enough to get the test.exec binary to have only wres stuff inside. The html report shows execution data from wres classes, like this:
Execution data for the following classes is considered in this report:
Class Id
wres.ExecutionResult 07eae356076bc519
wres.config.ProjectConfigPlus fa909eff8ae2a8a0
wres.config.ProjectConfigPlus.ProjValidationEventHandler dd55832cea4c048b
wres.config.ProjectConfigs e7664393f4cd934a
...
And the jars are printed by that println: @> Task :jacocoTestReport@ @[C:\Users\User.Name\code\wres\systests\build\install\systests\build\wres-20210510-dc74b7d-dev\src\wres-config-20210507-26bb8a7-dev-sources.jar, C:\Users\User.Name\code\wres\systests\build\install\systests\build\wres-20210510-dc74b7d-dev\src\wres-datamodel-20210505-13953f0-dev-sources.jar, C:\Users\User.Name\code\wres\systests\build\install\systests\build\wres-20210510-dc74b7d-dev\src\wres-events-20210505-13953f0-dev-sources.jar, C:\Users\User.Name\code\wres\systests\build\install\systests\build\wres-20210510-dc74b7d-dev\src\wres-eventsbroker-20210505-13953f0-dev-sources.jar, C:\Users\User.Name\code\wres\systests\build\install\systests\build\wres-20210510-dc74b7d-dev\src\wres-io-20210505-13953f0-dev-sources.jar, C:\Users\User.Name\code\wres\systests\build\install\systests\build\wres-20210510-dc74b7d-dev\src\wres-metrics-20210505-13953f0-dev-sources.jar, C:\Users\User.Name\code\wres\systests\build\install\systests\build\wres-20210510-dc74b7d-dev\src\wres-statistics-20210505-13953f0-dev-sources.jar, C:\Users\User.Name\code\wres\systests\build\install\systests\build\wres-20210510-dc74b7d-dev\src\wres-system-20210505-13953f0-dev-sources.jar, C:\Users\User.Name\code\wres\systests\build\install\systests\build\wres-20210510-dc74b7d-dev\src\wres-util-20210505-13953f0-dev-sources.jar, C:\Users\User.Name\code\wres\systests\build\install\systests\build\wres-20210510-dc74b7d-dev\src\wres-vis-20210505-13953f0-dev-sources.jar]@
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-05-11T20:44:42Z
Changing around the directory and includes to this:
additionalSourceDirs.setFrom( fileTree( dir: "${buildDir}/wres-*/src/",
include: "*.jar" ) )
Yields an empty list (guessing because the directory is evaluated at configuration time where the include is evaluated later - the wres src dir does not exist yet, something like that).
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-05-11T20:53:11Z
Similar situation to #51655-38 with this:
jacocoTestReport
{
reports
{
classDirectories.setFrom( fileTree( dir: buildDir,
include: "wres-*/src/*.jar" ) )
xml.enabled false
csv.enabled true
html.enabled true
}
doLast { println( classDirectories.asList() ) }
}
</code>
The jars are printed, but jacoco didn't see them as having the sources.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-05-11T22:16:34Z
After re-reading the message @No class files specified.@ and reviewing the properties, thought I would try setting the binary jars in one property and the source jars in another property, and finally, we have code coverage.
test
{
//...
jacoco
{
includes = [ "wres.*" ]
}
}
jacocoTestReport
{
reports
{
classDirectories.setFrom( fileTree( dir: buildDir,
include: "wres-*/lib/wres*.jar" ) )
// jacoco still does not see the sources despite this following line.
additionalSourceDirs.setFrom( fileTree( dir: buildDir,
include: "wres-*/src/*.jar" ) )
xml.enabled false
csv.enabled true
html.enabled true
}
// The list of jars is found else this would not print it successfully.
doLast { println( classDirectories.asList() ) }
doLast { println( additionalSourceDirs.asList() ) }
}
</code>
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-05-11T22:18:38Z
OK, not quite.
Source file "wres/io/reading/SourceLoader.java" was not found during generation of report.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-05-11T22:37:58Z
commit:06922082fbfee0dd5d0c661be4aee34631da0289 has the progress on the report which does show the high level details but does not show exact lines of source files covered/uncovered/etc.
Author Name: Hank (Hank) Original Redmine Issue: 51655, https://vlab.noaa.gov/redmine/issues/51655 Original Date: 2018-01-08
EDIT: Original ticket description edited to account for updated desires.
Aspects include:
(Action Cancelled!) Splitting the system testing repo from the development repo (#47419)
Reduce the size of the data for 000-800 (+1000) series (see subtasks)
Allow for system tests to be run via JUnit; again 000-800 (+1000)
Establishing benchmarks for all tests
Additionally, we need to rethink the system tests and perhaps redesign them. The system tests should focus on testing new pathways through the software, end-to-end, not necessarily on the effects of parameters of individual components of the software... unless those effects are insufficiently tested via unit testing. The Alternative, above, gets at that point.
This is an umbrella task. Subtasks will be linked for the items above, except for #47419 which is related to both development and system testing and will therefore just be a related task.
Setting as urgent in the Backlog, but it is already on-going with Release 1.3, and the subtasks should reflect the on-going changes. Adding watchers.
Hank
Redmine related issue(s): 47419
Child issue(s): #186 Redmine child issue(s): 45173, 49443, 51653, 51654, 55670