NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a developer, I want to investigate and implement ways for the WRES to pick up on-going evaluations when a reboot sequence occurs #143

Open epag opened 3 weeks ago

epag commented 3 weeks ago

Author Name: Hank (Hank) Original Redmine Issue: 122395, https://vlab.noaa.gov/redmine/issues/122395 Original Date: 2023-11-08


The problem: When a Wednesday reboot sequence occurs, the order of the machines going down and coming up is unpredictable. If the Postgres server for an evaluation goes down before the application servers, then on-going evaluations will fail. If the Postgres server comes back after the application servers, then on-going evaluations that were picked back up will fail. In both cases, its because the Postgres server communication is cutoff.

The WRES should be able to pick back up on-going evaluations if the application servers are brought down before Postgres is brought down, and Postgres is brought back up before the application servers are brought back up. I asked for that sequence to be put in place in ITSG-1250:

https://jira.nws.noaa.gov/servicedesk/customer/portal/1/ITSG-1250

However, ITSG reported,

At the current staff and workload/backlog levels, I do not believe we will be able to accommodate this request at this time

Is there any reasonable way for us to adjust the WRES code to handle this situation? If necessary, I can start tracking the amount of time between the different machine reboots, in case its just a matter of increasing some sort of retry-cycle to wait longer for Postgres to come back.

This ticket can be resolved once an approach is identified and implemented. This ticket can be rejected if there is really nothing we can do.

There are problem old tickets related to this request; I'll find them and relate them later. Thanks,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-11-08T14:34:58Z


Hank wrote:

Is there any reasonable way for us to adjust the WRES code to handle this situation? If necessary, I can start tracking the amount of time between the different machine reboots, in case its just a matter of increasing some sort of retry-cycle to wait longer for Postgres to come back.

Within the core application, I don't think so. Ultimately, database connections can be lost for a variety of reasons and, once retries have been exhausted, the context doesn't matter.

Likewise, if a postgres server returns an error code that could appear in a variety of contexts, some of which might be recoverable from a service-level or higher perspective and others not, then I don't think there's anything we can do in the core app, we need to look higher up.

Of course, we can always propagate error codes and the service layer has complete control over what it retries.

In theory, we could implement a rule above the service level (but respected by the service) that says "any evaluation that fails during this datetime interval should be retried by default". It would be a bit of a tank-at-knife-fight approach, but I don't see much joy in inspecting error codes from postgres etc. because they likely don't clarify machine reboots as distinct from other server problems.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-11-08T14:47:19Z


These are the error codes we have to work with in terms of adding retry resilience:

https://www.postgresql.org/docs/current/errcodes-appendix.html

However, connection behavior is largely managed through Hikari connection pools, other than test connections at server start-up (which means per evaluation in the standalone).

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-11-09T17:36:12Z


Notes from meeting:

Do we have examples of it occurring in production that we can refer to? If not, perhaps do an experiment next week where I post a long evaluation just prior to the reboots. Report on stdout logging the next day. Is the error code included?

Discussed what kind of failure and will occur and how it will be logged. May need to be orchestrated in production.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-11-09T17:42:04Z


I found a complete exception message for a failed evaluation due to reboots with job id 8953870426895545144.

Hank

=================================

2023-11-08T21:38:48.190+0000 WARN DatabaseLockManagerPostgres About to restore a connection to database (1), lost org.postgresql.jdbc.PgConnection@14a83e56
2023-11-08T21:38:48.202+0000 WARN DatabaseLockManagerPostgres Exception while managing connections: 
java.lang.IllegalStateException: Could not get a raw database connection.
    at wres.io.database.Database.getRawConnection(Database.java:186)
    at wres.io.database.locking.DatabaseLockManagerPostgres.testAndRefresh(DatabaseLockManagerPostgres.java:823)
    at wres.io.database.locking.DatabaseLockManagerPostgres$RefreshConnectionsTask.run(DatabaseLockManagerPostgres.java:1056)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
    at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:693)
    at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:203)
    at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258)
    at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
    at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:263)
    at org.postgresql.Driver.makeConnection(Driver.java:443)
    at org.postgresql.Driver.connect(Driver.java:297)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:681)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:190)
    at wres.io.database.Database.getRawConnection(Database.java:331)
    at wres.io.database.Database.getRawConnection(Database.java:182)
    ... 8 common frames omitted
2023-11-08T21:38:49.203+0000 WARN DatabaseLockManagerPostgres About to restore a connection to database (1), lost org.postgresql.jdbc.PgConnection@14a83e56
2023-11-08T21:38:49.215+0000 WARN DatabaseLockManagerPostgres Exception while managing connections: 
java.lang.IllegalStateException: Could not get a raw database connection.
    at wres.io.database.Database.getRawConnection(Database.java:186)
    at wres.io.database.locking.DatabaseLockManagerPostgres.testAndRefresh(DatabaseLockManagerPostgres.java:823)
    at wres.io.database.locking.DatabaseLockManagerPostgres$RefreshConnectionsTask.run(DatabaseLockManagerPostgres.java:1056)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
    at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:693)
    at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:203)
    at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258)
    at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
    at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:263)
    at org.postgresql.Driver.makeConnection(Driver.java:443)
    at org.postgresql.Driver.connect(Driver.java:297)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:681)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:190)
    at wres.io.database.Database.getRawConnection(Database.java:331)
    at wres.io.database.Database.getRawConnection(Database.java:182)
    ... 8 common frames omitted
2023-11-08T21:38:50.216+0000 WARN DatabaseLockManagerPostgres About to restore a connection to database (1), lost org.postgresql.jdbc.PgConnection@14a83e56
2023-11-08T21:38:50.228+0000 WARN DatabaseLockManagerPostgres Exception while managing connections: 
java.lang.IllegalStateException: Could not get a raw database connection.
    at wres.io.database.Database.getRawConnection(Database.java:186)
    at wres.io.database.locking.DatabaseLockManagerPostgres.testAndRefresh(DatabaseLockManagerPostgres.java:823)
    at wres.io.database.locking.DatabaseLockManagerPostgres$RefreshConnectionsTask.run(DatabaseLockManagerPostgres.java:1056)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
    at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:693)
    at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:203)
    at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258)
    at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
    at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:263)
    at org.postgresql.Driver.makeConnection(Driver.java:443)
    at org.postgresql.Driver.connect(Driver.java:297)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:681)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:190)
    at wres.io.database.Database.getRawConnection(Database.java:331)
    at wres.io.database.Database.getRawConnection(Database.java:182)
    ... 8 common frames omitted
2023-11-08T21:38:51.228+0000 WARN DatabaseLockManagerPostgres About to restore a connection to database (1), lost org.postgresql.jdbc.PgConnection@14a83e56
2023-11-08T21:38:51.240+0000 WARN DatabaseLockManagerPostgres Exception while managing connections: 
java.lang.IllegalStateException: Could not get a raw database connection.
    at wres.io.database.Database.getRawConnection(Database.java:186)
    at wres.io.database.locking.DatabaseLockManagerPostgres.testAndRefresh(DatabaseLockManagerPostgres.java:823)
    at wres.io.database.locking.DatabaseLockManagerPostgres$RefreshConnectionsTask.run(DatabaseLockManagerPostgres.java:1056)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
    at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:693)
    at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:203)
    at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258)
    at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
    at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:263)
    at org.postgresql.Driver.makeConnection(Driver.java:443)
    at org.postgresql.Driver.connect(Driver.java:297)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:681)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:190)
    at wres.io.database.Database.getRawConnection(Database.java:331)
    at wres.io.database.Database.getRawConnection(Database.java:182)
    ... 8 common frames omitted
2023-11-08T21:38:52.241+0000 WARN DatabaseLockManagerPostgres About to restore a connection to database (1), lost org.postgresql.jdbc.PgConnection@14a83e56
2023-11-08T21:38:52.253+0000 WARN DatabaseLockManagerPostgres Exception while managing connections: 
java.lang.IllegalStateException: Could not get a raw database connection.
    at wres.io.database.Database.getRawConnection(Database.java:186)
    at wres.io.database.locking.DatabaseLockManagerPostgres.testAndRefresh(DatabaseLockManagerPostgres.java:823)
    at wres.io.database.locking.DatabaseLockManagerPostgres$RefreshConnectionsTask.run(DatabaseLockManagerPostgres.java:1056)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
    at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:693)
    at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:203)
    at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:258)
    at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
    at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:263)
    at org.postgresql.Driver.makeConnection(Driver.java:443)
    at org.postgresql.Driver.connect(Driver.java:297)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:681)
    at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:190)
    at wres.io.database.Database.getRawConnection(Database.java:331)
    at wres.io.database.Database.getRawConnection(Database.java:182)
    ... 8 common frames omitted
2023-11-08T21:38:52.527+0000 WARN DatabaseLockManagerPostgres Unable to unlock remaining database locks.
org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:993)
    at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:1819)
    at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:518)
    at wres.io.database.locking.DatabaseLockManagerPostgres.releaseSingleLock(DatabaseLockManagerPostgres.java:1227)
    at wres.io.database.locking.DatabaseLockManagerPostgres.pgUnlockShared(DatabaseLockManagerPostgres.java:757)
    at wres.io.database.locking.DatabaseLockManagerPostgres.shutdown(DatabaseLockManagerPostgres.java:152)
    at wres.pipeline.Evaluator.evaluate(Evaluator.java:316)
    at wres.pipeline.Evaluator.evaluate(Evaluator.java:125)
    at wres.Functions.execute(Functions.java:282)
    at wres.Functions.call(Functions.java:151)
    at wres.Main.completeExecution(Main.java:236)
    at wres.Main.main(Main.java:136)
2023-11-08T21:38:52.527+0000 WARN DatabaseLockManagerPostgres WRES did not clean up by unlocking these shared locks: [1]
2023-11-08T21:38:52.528+0000 INFO Functions The function 'execute' took PT5H33M57.459932344S
2023-11-08T21:41:40.982+0000 ERROR Main Operation 'execute' completed unsuccessfully
wres.pipeline.InternalWresException: Could not complete project execution
    at wres.pipeline.Evaluator.evaluate(Evaluator.java:297)
    at wres.pipeline.Evaluator.evaluate(Evaluator.java:125)
    at wres.Functions.execute(Functions.java:282)
    at wres.Functions.call(Functions.java:151)
    at wres.Main.completeExecution(Main.java:236)
    at wres.Main.main(Main.java:136)
Caused by: wres.pipeline.WresProcessingException: Encountered an error while processing evaluation 'uDTOcmmq1EvLbUik71XXs9nvzg4': 
    at wres.pipeline.EvaluationUtilities.evaluate(EvaluationUtilities.java:306)
    at wres.pipeline.Evaluator.evaluate(Evaluator.java:271)
    ... 5 common frames omitted
Caused by: wres.pipeline.WresProcessingException: Project failed to complete with the following error: 
    at wres.pipeline.EvaluationUtilities.evaluate(EvaluationUtilities.java:608)
    at wres.pipeline.EvaluationUtilities.evaluate(EvaluationUtilities.java:250)
    ... 6 common frames omitted
Caused by: java.util.concurrent.CompletionException: wres.io.retrieving.DataAccessException: Failed to access the time-series data.
    at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
    at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1770)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: wres.io.retrieving.DataAccessException: Failed to access the time-series data.
    at wres.io.retrieving.database.TimeSeriesRetriever.getTimeSeriesFromScript(TimeSeriesRetriever.java:516)
    at wres.io.retrieving.database.EnsembleForecastRetriever.get(EnsembleForecastRetriever.java:184)
    at wres.io.retrieving.database.EnsembleForecastRetriever.get(EnsembleForecastRetriever.java:30)
    at wres.pipeline.pooling.PoolSupplier.createPool(PoolSupplier.java:231)
    at wres.pipeline.pooling.PoolSupplier.get(PoolSupplier.java:201)
    at wres.pipeline.pooling.PoolSupplier.get(PoolSupplier.java:79)
    at wres.pipeline.pooling.PoolFactory$SupplierWithPoolRequest.get(PoolFactory.java:2398)
    at wres.pipeline.pooling.PoolProcessor.get(PoolProcessor.java:130)
    at wres.pipeline.pooling.PoolProcessor.get(PoolProcessor.java:56)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
    ... 3 common frames omitted
Caused by: java.sql.SQLException: The Query: 
wres.io.database.Query@669ce252[script=SELECT 
    metadata.series_id AS series_id,
    metadata.reference_time + INTERVAL '1' MINUTE * TSV.lead AS valid_time,
    metadata.reference_time,
    ARRAY_AGG(TSV.series_value ORDER BY TS.ensemble_id) AS ensemble_members,
    ARRAY_AGG(TS.ensemble_id ORDER BY TS.ensemble_id) AS ensemble_ids,
    metadata.measurementunit_id,
    metadata.scale_period,
    metadata.scale_function,
    metadata.feature_id,
    metadata.occurrences
FROM
(
    SELECT 
        S.source_id AS series_id,
        MAX( reference_time ) AS reference_time,
        S.feature_id,
        S.measurementunit_id,
        TimeScale.duration_ms AS scale_period,
        TimeScale.function_name AS scale_function,
        COUNT(*) AS occurrences 
    FROM wres.Source S
    INNER JOIN wres.ProjectSource PS
        ON PS.source_id = S.source_id
    INNER JOIN wres.TimeSeriesReferenceTime TSRT
        ON TSRT.source_id = S.source_id
    LEFT JOIN wres.TimeScale TimeScale
        ON TimeScale.timescale_id = S.timescale_id
WHERE PS.project_id = ?
    AND S.variable_name = ?
    AND S.feature_id = ?
    AND PS.member = ?
    GROUP BY S.source_id,
        S.measurementunit_id,
        TimeScale.duration_ms,
        TimeScale.function_name
) AS metadata 
INNER JOIN wres.TimeSeries TS
    ON TS.source_id = metadata.series_id
INNER JOIN wres.TimeSeriesValue TSV
    ON TSV.timeseries_id = TS.timeseries_id
WHERE TSV.lead > ?
    AND TSV.lead <= ?
    AND ( EXTRACT( MONTH FROM metadata.reference_time ) > ? OR ( EXTRACT( MONTH FROM metadata.reference_time ) = ? AND EXTRACT( DAY FROM metadata.reference_time ) >= ? ) )
    AND ( EXTRACT( MONTH FROM metadata.reference_time ) < ? OR ( EXTRACT( MONTH FROM metadata.reference_time ) = ? AND EXTRACT( DAY FROM metadata.reference_time ) <= ? ) )
GROUP BY metadata.series_id,metadata.reference_time, metadata.feature_id, TSV.lead, metadata.scale_period, metadata.scale_function, metadata.measurementunit_id,metadata.occurrences
,forceTransaction=false,useCursor=true,parameters={5139,SQIN,2081041,right,502200,505080,1,1,1,12,12,31},batchParameters=<null>,sqlStatesToRetry=[],insertedIds=[]]

Failed.
    at wres.io.database.Query.call(Query.java:448)
    at wres.io.database.Database.buffer(Database.java:276)
    at wres.io.database.DataScripter.buffer(DataScripter.java:228)
    at wres.io.retrieving.database.TimeSeriesRetriever.getTimeSeriesFromScript(TimeSeriesRetriever.java:466)
    ... 12 common frames omitted
Caused by: org.postgresql.util.PSQLException: FATAL: terminating connection due to administrator command
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2713)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2401)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:368)
    at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:498)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:415)
    at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190)
    at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:134)
    at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
    at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
    at com.github.marschall.jfr.jdbc.JfrPreparedStatement.executeQuery(JfrPreparedStatement.java:70)
    at wres.io.database.Query.callWithParameters(Query.java:709)
    at wres.io.database.Query.call(Query.java:410)
    ... 15 common frames omitted
2023-11-08T21:41:40.983+0000 INFO Main Closing the application...
epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-11-09T17:45:22Z


Evan mentioned putting the job into a different state causing it to be requeued. However, James recommended coding in resilience at the app level, not service layer.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2024-06-28T11:55:38Z


Arguably should be phrased more generally, but there is a lot of NWCAL-centric commentary in this ticket. If we migrate to github, it will need to be rewritten.