Closed cransom closed 1 year ago
tested locally by running postgres and hydra-evaluator via foreman and then stopping postgres.
1ogre:~/git/branches/hydra/master% foreman start hydra-evaluator
16:07:56 hydra-evaluator.1 | started with pid 417284
16:07:56 hydra-evaluator.1 | Connection to localhost (127.0.0.1) 63333 port [tcp/*] succeeded!
16:07:58 hydra-evaluator.1 | received jobset event
16:07:58 hydra-evaluator.1 | exception in database monitor thread: Lost connection to the database server.
16:08:28 hydra-evaluator.1 | exception in database monitor thread: Lost connection to the database server.
# here, the db was restarted but evaluator would never pick up on that.
^C16:08:53 system | SIGINT received, starting shutdown
16:08:54 system | sending SIGTERM to all processes
16:08:54 hydra-evaluator.1 | exited with code 1
2022-10-26 16:08:54-0400 [master|✚1…2]
1ogre:~/git/branches/hydra/master% foreman start hydra-evaluator
16:08:59 hydra-evaluator.1 | started with pid 420495
16:08:59 hydra-evaluator.1 | Connection to localhost (127.0.0.1) 63333 port [tcp/*] succeeded!
16:09:09 hydra-evaluator.1 | received jobset event
16:09:13 hydra-evaluator.1 | received jobset event
16:09:13 hydra-evaluator.1 | Database connection broken: Lost connection to the database server.
16:09:13 hydra-evaluator.1 | exited with code 1
16:09:13 system | sending SIGTERM to all processes
@grahamc can you take a look? I cannot judge the impact on the upstream Hydra
Thanks!
There's currently no automatic recovery for disconnected databases in the evaluator. This means if the database is ever temporarily unavailable, hydra-evaluator will sit and spin with no work accomplished.
If this condition is caught, the daemon will exit and systemd will be responsible for resuming the service.