Open epag opened 2 months ago
Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:04:15Z
For some absolutely bizarre reason I am getting a 403 on attempting to add the stack trace here (also in #103431), so I am attaching instead.
edit: oh, attachments not allowed either. Something is up w/ Redmine...
Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:10:29Z
hprof uploaded to gdrive:
https://drive.google.com/drive/u/2/folders/1jREhjjJ2fUV99nnG7KG8wrATPdpsqMCR
Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:15:27Z
Oh, my bad, I was using h2, not postgress. Still, I think this stands, it should complete.
The other thing is that the system tests should really fail when any one test oomes, so there is an error being trapped somewhere. The next test (after that one that failed with an oome) has now been stuck for the last 30 minutes with this trace at the bottom of the log:
edit: sorry, I cannot post most of the trace due to issues with Redmine.
2022-04-14T11:44:56.952+0000 [Outer Reading/Ingest Thread 5 -> #177] ERROR wres.io.concurrency.IngestSaver - Callable task failed
.
.
Caused by: java.lang.InterruptedException: sleep interrupted
at java.base/java.lang.Thread.sleep(Native Method)
at org.##mvstore.MVMap.tryLock(MVMap.java:1923)
... 39 common frames omitted
Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:17:20Z
Anyway, given this is h2 and we know that requires a lot of memory, this is probably not a super high priority. The improvements to the system tests can probably be made elsewhere.
Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:19:06Z
Eventually it moved on and failed for another reason:
Caused by: wres.io.reading.IngestException: Another task did not ingest and complete (java.util.concurrent.CountDownLatch@730d3dd5[Count = 0],java.util.concurrent.CountDownLatch@57b2bda3[Count = 1]) within PT30M, therefore assuming it failed.
Original Redmine Comment Author Name: James (James) Original Date: 2022-04-14T12:19:35Z
Anyway, killing it. We have the heap dump for the original oome.
Author Name: James (James) Original Redmine Issue: 103766, https://vlab.noaa.gov/redmine/issues/103766 Original Date: 2022-04-14
Given an evaluation of scenario009 (that happened to be in system test mode with @-Xmx512m@) When that evaluation proceeds Then it shouldn't OOME