Open epag opened 3 months ago
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-13T12:03:59Z
tl;dr, investigate #94126. Unclear on cursory inspection whether the software is working as intended or not. It appears to be working insofar as a finite number of attempts are made to re-acquire the lock on a source ingest in progress and, once exhausted, an exception is thrown. This seems logical, but I want to be sure that the semantics are working as expected, on the one hand, and cannot be made more resilient, on the other (this is a trade-off between recovery where possible and failing early when not).
Another minor issue is that the standard out attached to #94126 shows logging of complete time-series whose ingest failed, which may overwhelm the service in some cases (we have examples of this in other contexts if the time-series is sufficiently long). It should be replaced with logging of the ts metadata only.
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-13T12:06:44Z
Hank, would you mind uploading the postgres log and @check_mk@ monitoring for roughly the time spanned by #94126? Jesse is out this week and he is the author of the db locking, so it would be a more efficient use of his time than mine (edit: that is to say, I would need to invest significant time to understand it before determining whether a fix were needed and, if so, fixing it). The job url and standard out is attached to #94126. Thanks!
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2021-07-13T13:23:27Z
Log is attached.
What Check_MK displays would you like to see?
Hank
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-13T13:30:11Z
Not sure what displays might be available for networking, but everything you can (or within reason, what makes sense to you) for the prod db server, I guess.
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-13T13:42:26Z
Relevant part of the postgres log:
2021-07-13 03:29:03.099 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db6 LOG: disconnection: session time: 0:00:18.212 user=wres_user8 database=wres8 host=10.3.10.104 port=59538
2021-07-13 03:29:03.168 UTC 10.3.10.104 wres8 wres_user8 60ed031b.37ca LOG: disconnection: session time: 0:22:59.758 user=wres_user8 database=wres8 host=10.3.10.104 port=48014
2021-07-13 03:29:03.369 UTC 10.3.10.104 wres8 wres_user8 60ed018d.32bb LOG: disconnection: session time: 0:29:38.030 user=wres_user8 database=wres8 host=10.3.10.104 port=58870
2021-07-13 03:29:03.370 UTC 10.3.10.104 [unknown] [unknown] 60ed087f.4dcc LOG: connection received: host=10.3.10.104 port=59046
2021-07-13 03:29:03.387 UTC 10.3.10.104 wres8 wres_user8 60ed087f.4dcc LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:29:03.446 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db4 LOG: incomplete message from client
2021-07-13 03:29:03.446 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db4 CONTEXT: COPY timeseriesvalue_lead_above_150, line 7376
2021-07-13 03:29:03.446 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db4 STATEMENT: COPY wres.TimeSeriesValue_Lead_Above_150 ( lead,series_value,timeseries_id ) FROM STDIN WITH DELIMITER '|'
2021-07-13 03:29:03.446 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db4 ERROR: unexpected EOF on client connection with an open transaction
2021-07-13 03:29:03.446 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db4 CONTEXT: COPY timeseriesvalue_lead_above_150, line 7376
2021-07-13 03:29:03.446 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db4 STATEMENT: COPY wres.TimeSeriesValue_Lead_Above_150 ( lead,series_value,timeseries_id ) FROM STDIN WITH DELIMITER '|'
2021-07-13 03:29:03.447 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db4 FATAL: terminating connection because protocol synchronization was lost
2021-07-13 03:29:03.447 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db4 LOG: disconnection: session time: 0:00:18.613 user=wres_user8 database=wres8 host=10.3.10.104 port=59536
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-13T13:42:55Z
Looks like the problem started on the client side, but the real cause could be a few different things, I guess, including a network interruption or an abrupt close (edit: of the connection) by the client during a transaction.
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-13T13:46:14Z
In any case, whatever happen occurred within a very short period because a connection from @wres_user8@ was received and authorized a few ms later.
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-13T13:50:33Z
Ah, just spotted these two lines about PT1S earlier, which is potentially relevant too:
2021-07-13 03:29:02.244 UTC 10.3.10.104 wres8 wres_user8 60ed087d.4dc9 WARNING: you don't own a lock of type ExclusiveLock
2021-07-13 03:29:02.307 UTC 10.3.10.104 wres8 wres_user8 60ed087d.4dc9 WARNING: you don't own a lock of type ExclusiveLock
Looks like the client attempted to release a lock that it didn't own, so that may indicate some error in the locking semantics on the client side, not sure. Again, I think this will need to wait until Jesse returns, if I'm not to sink a ton of time into it, so just trying to gather sufficient details here.
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2021-07-13T14:34:24Z
You requested Check_MK graphs in the other ticket. I've attached graphs related to CPU load and utilization, and disk I/O here. My Check_MK-foo is very limited, so I'm not really sure what else to get or how to better format them.
Hank
Original Redmine Comment Author Name: Hank (Hank) Original Date: 2021-07-13T14:37:28Z
By the way, the job Jim reported in #94126 started at 2021-07-13T03:28:48.713+0000 and ended at 2021-07-13T03:29:09.138+0000. With plots provided in CST, the time to examine is around Monday at 22:30 hours.
Hank
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-13T14:52:48Z
I did, but I was thinking in terms of network health (traffic and status). In any case, on inspecting the postgres log, it looks unlikely that we'd see anything in @check_mk@.
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-14T14:19:55Z
Some wild speculation in #94168-7. Again, I could look at this properly, but it would require me to fully review the implementation first, in order to understand what it is doing and why.
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-14T14:23:56Z
As to why this has suddenly started to appear, best guess would be nwcal networking in the recent day or two.
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-14T14:24:52Z
Btw, feel free to rename this ticket, upon investigation, once the real problem is discovered.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-07-19T17:02:52Z
Here is the first thing: @2021-07-13T03:29:06.839+0000 WARN DatabaseLockManagerPostgres About to restore a connection to database (1), lost org.postgresql.jdbc.PgConnection@2fa59863@
From WRES' perspective, it lost that connection.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-07-19T17:09:08Z
Probable section of postgres logs for full evaluation:
2021-07-13 03:28:42.731 UTC 10.3.10.104 [unknown] [unknown] 60ed086a.4da9 LOG: connection received: host=10.3.10.104 port=59512
2021-07-13 03:28:42.731 UTC 10.3.10.104 [unknown] [unknown] 60ed086a.4da9 LOG: incomplete startup packet
2021-07-13 03:28:42.906 UTC 10.3.10.104 [unknown] [unknown] 60ed086a.4daa LOG: connection received: host=10.3.10.104 port=59514
2021-07-13 03:28:43.812 UTC 10.3.10.104 wres8 wres_user8 60ed086a.4daa LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.456 UTC 10.3.10.104 wres8 wres_user8 60ed086a.4daa LOG: disconnection: session time: 0:00:01.550 user=wres_user8 database=wres8 host=10.3.10.104 port=59514
2021-07-13 03:28:44.488 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4dac LOG: connection received: host=10.3.10.104 port=59516
2021-07-13 03:28:44.550 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4dac LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.568 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4dad LOG: connection received: host=10.3.10.104 port=59518
2021-07-13 03:28:44.598 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4dad LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.669 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4dae LOG: connection received: host=10.3.10.104 port=59520
2021-07-13 03:28:44.705 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4dae LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.711 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4daf LOG: connection received: host=10.3.10.104 port=59522
2021-07-13 03:28:44.717 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4db0 LOG: connection received: host=10.3.10.104 port=59524
2021-07-13 03:28:44.748 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4daf LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.761 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db0 LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.763 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4db1 LOG: connection received: host=10.3.10.104 port=59528
2021-07-13 03:28:44.773 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4db2 LOG: connection received: host=10.3.10.104 port=59530
2021-07-13 03:28:44.801 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db1 LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.819 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db2 LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.820 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4db3 LOG: connection received: host=10.3.10.104 port=59534
2021-07-13 03:28:44.833 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4db4 LOG: connection received: host=10.3.10.104 port=59536
2021-07-13 03:28:44.875 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db3 LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.883 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db4 LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.887 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4db6 LOG: connection received: host=10.3.10.104 port=59538
2021-07-13 03:28:44.892 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4db7 LOG: connection received: host=10.3.10.104 port=59540
2021-07-13 03:28:44.924 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db7 LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.936 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db6 LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.940 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4db8 LOG: connection received: host=10.3.10.104 port=59542
2021-07-13 03:28:44.974 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db8 LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:44.993 UTC 10.3.10.104 [unknown] [unknown] 60ed086c.4db9 LOG: connection received: host=10.3.10.104 port=59544
2021-07-13 03:28:45.024 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db9 LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:45.622 UTC 10.3.10.104 [unknown] [unknown] 60ed086d.4dba LOG: connection received: host=10.3.10.104 port=59546
2021-07-13 03:28:45.644 UTC 10.3.10.104 wres8 wres_user8 60ed086d.4dba LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:45.652 UTC 10.3.10.104 [unknown] [unknown] 60ed086d.4dbb LOG: connection received: host=10.3.10.104 port=59548
2021-07-13 03:28:45.677 UTC 10.3.10.104 wres8 wres_user8 60ed086d.4dbb LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4daf ERROR: could not serialize access due to read/write dependencies among transactions
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4daf DETAIL: Reason code: Canceled on identification as a pivot, during write.
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4daf HINT: The transaction might succeed if retried.
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4daf STATEMENT: INSERT INTO wres.Feature ( name, description, srid, wkt )
SELECT $1, $2, $3, $4
WHERE NOT EXISTS
(
SELECT 1
FROM wres.Feature
WHERE name = $5
AND description = $6
AND srid = $7
AND wkt = $8
)
RETURNING *
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4dad ERROR: could not serialize access due to read/write dependencies among transactions
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4dad DETAIL: Reason code: Canceled on identification as a pivot, during write.
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4dad HINT: The transaction might succeed if retried.
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4dad STATEMENT: INSERT INTO wres.Feature ( name, description, srid, wkt )
SELECT $1, $2, $3, $4
WHERE NOT EXISTS
(
SELECT 1
FROM wres.Feature
WHERE name = $5
AND description = $6
AND srid = $7
AND wkt = $8
)
RETURNING *
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db3 ERROR: could not serialize access due to read/write dependencies among transactions
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db3 DETAIL: Reason code: Canceled on identification as a pivot, during write.
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db3 HINT: The transaction might succeed if retried.
2021-07-13 03:28:48.476 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db3 STATEMENT: INSERT INTO wres.Feature ( name, description, srid, wkt )
SELECT $1, $2, $3, $4
WHERE NOT EXISTS
(
SELECT 1
FROM wres.Feature
WHERE name = $5
AND description = $6
AND srid = $7
AND wkt = $8
)
RETURNING *
2021-07-13 03:28:48.478 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db6 ERROR: could not serialize access due to read/write dependencies among transactions
2021-07-13 03:28:48.478 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db6 DETAIL: Reason code: Canceled on identification as a pivot, during write.
2021-07-13 03:28:48.478 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db6 HINT: The transaction might succeed if retried.
2021-07-13 03:28:48.478 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db6 STATEMENT: INSERT INTO wres.Feature ( name, description, srid, wkt )
SELECT $1, $2, $3, $4
WHERE NOT EXISTS
(
SELECT 1
FROM wres.Feature
WHERE name = $5
AND description = $6
AND srid = $7
AND wkt = $8
)
RETURNING *
2021-07-13 03:29:01.984 UTC 10.3.10.104 [unknown] [unknown] 60ed087d.4dc9 LOG: connection received: host=10.3.10.104 port=59566
2021-07-13 03:29:02.096 UTC 10.3.10.104 wres8 wres_user8 60ed0193.32c3 LOG: disconnection: session time: 0:29:31.027 user=wres_user8 database=wres8 host=10.3.10.104 port=58876
2021-07-13 03:29:02.105 UTC 10.3.10.104 wres8 wres_user8 60ed018d.32bc LOG: disconnection: session time: 0:29:36.747 user=wres_user8 database=wres8 host=10.3.10.104 port=58872
2021-07-13 03:29:02.142 UTC 10.3.10.104 wres8 wres_user8 60ed087d.4dc9 LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:29:02.177 UTC 10.3.10.104 [unknown] [unknown] 60ed087e.4dca LOG: connection received: host=10.3.10.104 port=59044
2021-07-13 03:29:02.200 UTC 10.3.10.104 wres8 wres_user8 60ed087e.4dca LOG: connection authorized: user=wres_user8 database=wres8 SSL enabled (protocol=TLSv1.2, cipher=ECDHE-RSA-AES128-GCM-SHA256, compression=off)
2021-07-13 03:29:02.244 UTC 10.3.10.104 wres8 wres_user8 60ed087d.4dc9 WARNING: you don't own a lock of type ExclusiveLock
2021-07-13 03:29:02.307 UTC 10.3.10.104 wres8 wres_user8 60ed087d.4dc9 WARNING: you don't own a lock of type ExclusiveLock
2021-07-13 03:29:02.308 UTC 10.3.10.104 wres8 wres_user8 60ed087d.4dc9 LOG: disconnection: session time: 0:00:00.323 user=wres_user8 database=wres8 host=10.3.10.104 port=59566
2021-07-13 03:29:02.308 UTC 10.3.10.104 wres8 wres_user8 60ed086d.4dbb LOG: disconnection: session time: 0:00:16.656 user=wres_user8 database=wres8 host=10.3.10.104 port=59548
2021-07-13 03:29:02.392 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4dac LOG: disconnection: session time: 0:00:17.904 user=wres_user8 database=wres8 host=10.3.10.104 port=59516
2021-07-13 03:29:02.392 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4dae LOG: disconnection: session time: 0:00:17.723 user=wres_user8 database=wres8 host=10.3.10.104 port=59520
2021-07-13 03:29:02.394 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db9 LOG: disconnection: session time: 0:00:17.400 user=wres_user8 database=wres8 host=10.3.10.104 port=59544
2021-07-13 03:29:02.394 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db0 LOG: disconnection: session time: 0:00:17.677 user=wres_user8 database=wres8 host=10.3.10.104 port=59524
2021-07-13 03:29:02.394 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db2 LOG: disconnection: session time: 0:00:17.620 user=wres_user8 database=wres8 host=10.3.10.104 port=59530
2021-07-13 03:29:02.394 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db7 LOG: disconnection: session time: 0:00:17.503 user=wres_user8 database=wres8 host=10.3.10.104 port=59540
2021-07-13 03:29:02.398 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db8 LOG: disconnection: session time: 0:00:17.457 user=wres_user8 database=wres8 host=10.3.10.104 port=59542
2021-07-13 03:29:02.592 UTC 10.3.10.104 wres8 wres_user8 60ed01a0.32d1 LOG: disconnection: session time: 0:29:17.653 user=wres_user8 database=wres8 host=10.3.10.104 port=58880
2021-07-13 03:29:02.728 UTC 10.3.10.104 wres8 wres_user8 60ed031a.37c7 LOG: disconnection: session time: 0:23:00.630 user=wres_user8 database=wres8 host=10.3.10.104 port=58930
2021-07-13 03:29:03.024 UTC 10.3.10.104 wres8 wres_user8 60ed086d.4dba LOG: disconnection: session time: 0:00:17.402 user=wres_user8 database=wres8 host=10.3.10.104 port=59546
2021-07-13 03:29:03.097 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4dad LOG: disconnection: session time: 0:00:18.528 user=wres_user8 database=wres8 host=10.3.10.104 port=59518
2021-07-13 03:29:03.098 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4daf LOG: disconnection: session time: 0:00:18.387 user=wres_user8 database=wres8 host=10.3.10.104 port=59522
2021-07-13 03:29:03.098 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db1 LOG: disconnection: session time: 0:00:18.335 user=wres_user8 database=wres8 host=10.3.10.104 port=59528
2021-07-13 03:29:03.098 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db3 LOG: disconnection: session time: 0:00:18.278 user=wres_user8 database=wres8 host=10.3.10.104 port=59534
2021-07-13 03:29:03.099 UTC 10.3.10.104 wres8 wres_user8 60ed086c.4db6 LOG: disconnection: session time: 0:00:18.212 user=wres_user8 database=wres8 host=10.3.10.104 port=59538
2021-07-13 03:29:03.168 UTC 10.3.10.104 wres8 wres_user8 60ed031b.37ca LOG: disconnection: session time: 0:22:59.758 user=wres_user8 database=wres8 host=10.3.10.104 port=48014
2021-07-13 03:29:03.369 UTC 10.3.10.104 wres8 wres_user8 60ed018d.32bb LOG: disconnection: session time: 0:29:38.030 user=wres_user8 database=wres8 host=10.3.10.104 port=58870
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-07-20T21:10:36Z
Unfortunately this was able to be tested when I had VirtualBox and now it cannot be tested without, so any potential improvements will be speculative.
Original Redmine Comment Author Name: James (James) Original Date: 2021-07-20T21:55:29Z
I can install VirtualBox, so I may be able to help you out w/ testing if needed...
Original Redmine Comment Author Name: Chris (Chris) Original Date: 2021-07-21T12:02:19Z
Unfortunately this was able to be tested when I had VirtualBox and now it cannot be tested without, so any potential improvements will be speculative.
I wonder if we should be building up a list for these sorts of events, but I don't know where we'd put them or what they'd eventually be used for. I feel like having a lengthy list will help us when IT asks the usual "WHY DO yoU EVEN NeEd vIRTuAlbOX Or dOCKer?!"
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-07-21T17:31:40Z
James wrote:
I can install VirtualBox, so I may be able to help you out w/ testing if needed...
I guess step one would be to try it out with the current version and kill a socket from outside WRES and outside postgres and see what happens. I doubt we will be able to exactly replicate the same situation but we might be able to verify that the changes don't break anything.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-07-21T17:32:17Z
Chris wrote:
Unfortunately this was able to be tested when I had VirtualBox and now it cannot be tested without, so any potential improvements will be speculative.
I wonder if we should be building up a list for these sorts of events, but I don't know where we'd put them or what they'd eventually be used for. I feel like having a lengthy list will help us when IT asks the usual "WHY DO yoU EVEN NeEd vIRTuAlbOX Or dOCKer?!"
That sounds reasonable, but reason doesn't work with that group.
Author Name: James (James) Original Redmine Issue: 94128, https://vlab.noaa.gov/redmine/issues/94128 Original Date: 2021-07-13
Given an evaluation in progress that loses a database connection when a source is being ingested When that evaluation attempts to recover Then it should make a best effort to recover the source lock
( So that I can be sure that, when I see behavior like #94126, a best effort has been made to recover and that the situation is unrecoverable. )
Related issue(s): #261 Redmine related issue(s): 93204, 94168