NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a user, I don't want an OOME on system test scenario009 when using h2 with a small but reasonable memory footprint #76

Open epag opened 2 months ago

epag commented 2 months ago

Author Name: James (James) Original Redmine Issue: 103766, https://vlab.noaa.gov/redmine/issues/103766 Original Date: 2022-04-14


Given an evaluation of scenario009 (that happened to be in system test mode with @-Xmx512m@) When that evaluation proceeds Then it shouldn't OOME

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:04:15Z


For some absolutely bizarre reason I am getting a 403 on attempting to add the stack trace here (also in #103431), so I am attaching instead.

edit: oh, attachments not allowed either. Something is up w/ Redmine...

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:10:29Z


hprof uploaded to gdrive:

https://drive.google.com/drive/u/2/folders/1jREhjjJ2fUV99nnG7KG8wrATPdpsqMCR

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:15:27Z


Oh, my bad, I was using h2, not postgress. Still, I think this stands, it should complete.

The other thing is that the system tests should really fail when any one test oomes, so there is an error being trapped somewhere. The next test (after that one that failed with an oome) has now been stuck for the last 30 minutes with this trace at the bottom of the log:

edit: sorry, I cannot post most of the trace due to issues with Redmine.

2022-04-14T11:44:56.952+0000  [Outer Reading/Ingest Thread 5 -> #177] ERROR wres.io.concurrency.IngestSaver - Callable task failed
.
.
Caused by: java.lang.InterruptedException: sleep interrupted
    at java.base/java.lang.Thread.sleep(Native Method)
    at org.##mvstore.MVMap.tryLock(MVMap.java:1923)
    ... 39 common frames omitted
epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:17:20Z


Anyway, given this is h2 and we know that requires a lot of memory, this is probably not a super high priority. The improvements to the system tests can probably be made elsewhere.

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:19:06Z


Eventually it moved on and failed for another reason:

Caused by: wres.io.reading.IngestException: Another task did not ingest and complete (java.util.concurrent.CountDownLatch@730d3dd5[Count = 0],java.util.concurrent.CountDownLatch@57b2bda3[Count = 1]) within PT30M, therefore assuming it failed.

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:19:35Z


Anyway, killing it. We have the heap dump for the original oome.