NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a system tester, I don't want scenario303 to fail on a benchmark comparison #157

Open epag opened 3 weeks ago

epag commented 3 weeks ago

Author Name: James (James) Original Redmine Issue: 125130, https://vlab.noaa.gov/redmine/issues/125130 Original Date: 2024-01-11


Given a system test of scenario303 at nwcal When the test completes Then I expect it to succeed and not fail with an exception on a benchmark comparison

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2024-01-11T11:26:38Z


Bizarre one this, seemingly unrelated to the associated commit and not failing locally.

2024-01-10T19:39:04.134+0000 ERROR Scenario303 testScenario(wres.systests.Scenario303)
    junit.framework.AssertionFailedError: Comparison with benchmarks failed with code 32. expected:<0> but was:<32>
        at junit.framework.Assert.fail(Assert.java:57)
        at junit.framework.Assert.failNotEquals(Assert.java:329)
        at junit.framework.Assert.assertEquals(Assert.java:78)
        at junit.framework.Assert.assertEquals(Assert.java:234)
        at junit.framework.TestCase.assertEquals(TestCase.java:377)
        at wres.systests.ScenarioHelper.assertOutputsMatchBenchmarks(ScenarioHelper.java:220)
        at wres.systests.Scenario303.testScenario(Scenario303.java:82)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
        at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
        at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
        at org.junit.runners.Suite.runChild(Suite.java:128)
        at org.junit.runners.Suite.runChild(Suite.java:27)
        at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
        at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
        at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
        at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:108)
        at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:57)
        at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:39)
        at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
        at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
        at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
        at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
        at jdk.proxy2/jdk.proxy2.$Proxy30.processTestClass(Unknown Source)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
        at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
        at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
        at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:113)
        at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
        at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
        at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)

Will need to get the actual output to see how it differs.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2024-01-11T11:27:16Z


89538-1388

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2024-01-11T11:28:34Z


Hank, could you grab this and post here?

2024-01-10T19:39:04.099+0000 WARN ScenarioHelper The metric CSV file differs from /wres_share/releases/systests-20231130-964129c/scenario303/benchmarks/LGNN5_LGNN5_HEFS_MEAN_ERROR.csv (result code 32) for file with name LGNN5_LGNN5_HEFS_MEAN_ERROR.csv

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-01-11T12:14:11Z


I'll do so as soon as I can. Still catching up on emails,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-01-11T12:26:39Z


First challenge: getting the complete file name of the CSV file being compared with the benchmark. There are quite a few in the @outputs@ folder that match "LGNN5_LGNN5_HEFS_MEAN_ERROR.csv". Looking at the log,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-01-11T12:37:05Z


Found the evaluation identifier in the first @INFO@ line shown below:

    2024-01-10T19:39:04.086+0000 INFO EvaluationUtilities The messager for evaluation fZ_hvDuK39Tl1PH0saDf1zr_woU has been closed.
    2024-01-10T19:39:04.093+0000 INFO Scenario303 Checking expected file names against actual file names that exist for 3 files...
    2024-01-10T19:39:04.094+0000 INFO Scenario303 Finished checking file names. The actual file names match the expected file names.
    2024-01-10T19:39:04.095+0000 INFO ScenarioHelper Asserting that outputs match benchmarks for scenario303...
    2024-01-10T19:39:04.099+0000 WARN ScenarioHelper The metric CSV file differs from /wres_share/releases/systests-20231130-964129c/scenario303/benchmarks/LGNN5_LGNN5_HEFS_MEAN_ERROR.csv (result code 32) for file with name LGNN5_LGNN5_HEFS_MEAN_ERROR.csv

That identifier allowed me to find the output in @/wres_share/releases/systests-20231130-964129c/outputs/wres_evaluation_fZ_hvDuK39Tl1PH0saDf1zr_woU@. Here is the difference:

[Hank@nwcal-wres-ti01 wres_evaluation_fZ_hvDuK39Tl1PH0saDf1zr_woU]$ diff LGNN5_LGNN5_HEFS_MEAN_ERROR.csv /wres_share/releases/systests-20231130-964129c/scenario303/benchmarks/LGNN5_LGNN5_HEFS_MEAN_ERROR.csv
8c8
< LGNN5,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,151200,151200,0.005737,-0.496730,-1.090937,-2.134435
---
> LGNN5,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,151200,151200,0.005737,-0.496730,-1.090938,-2.134435

Its the last decimal place of the second to last statistic: -1.090937 vs. -1.090938. The entire output file is below (since its small).

Thanks,

Hank

=============================================================

FEATURE DESCRIPTION,EARLIEST ISSUE TIME,LATEST ISSUE TIME,EARLIEST VALID TIME,LATEST VALID TIME,EARLIEST LEAD TIME IN SECONDS [UNKNOWN OVER PAST 21600 SECONDS],LATEST LEAD TIME IN SECONDS [UNKNOWN OVER PAST 21600 SECONDS],MEAN ERROR All data,MEAN ERROR > 0.0 MM [Pr = 0.75],MEAN ERROR > 0.23 MM [Pr = 0.9],MEAN ERROR > 1.24 MM [Pr = 0.95]
LGNN5,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,21600,21600,-0.127101,-1.107134,-2.155417,-3.792708
LGNN5,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,43200,43200,-0.083818,-0.505783,-1.155243,-5.946354
LGNN5,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,64800,64800,-0.025635,-0.671685,-1.282309,-2.216042
LGNN5,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,86400,86400,-0.299960,-4.298375,-5.573420,-7.661227
LGNN5,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,108000,108000,-0.123269,-1.124710,-2.010443,-3.610833
LGNN5,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,129600,129600,-0.056109,-0.463392,-1.192157,-5.733490
LGNN5,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,151200,151200,0.005737,-0.496730,-1.090937,-2.134435
LGNN5,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,172800,172800,-0.230831,-3.963556,-5.807118,-7.664792
epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2024-02-16T15:25:00Z


Curious one, but it will need to wait.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Evan (Evan) Original Date: 2024-03-19T14:19:07Z


6.21 is going to be just a docker deploy, moving all 6.21 tickets to 6.22

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2024-05-10T17:06:01Z


What's the status of this? Move to 6.23 or the backlog?

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2024-05-10T17:09:19Z


The status is that it's unfixed. We could drop the d.p. on these benchmarks:

decimal_format: '#0.000000'
</code>

However, I wouldn't really expect a difference at 6 d.p. Hey ho.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Evan (Evan) Original Date: 2024-05-21T16:21:06Z


Moving this to 6.24, 6.23 is going to be a docker only deploy

epag commented 3 weeks ago

Original Redmine Comment Author Name: Evan (Evan) Original Date: 2024-07-02T13:18:10Z


No work done on this in this sprint

HankHerr-NOAA commented 1 day ago

The issue described in this ticket cropped up when updating dependencies to research #68 and support #313 (6.26). Just noting that here,

Hank