Closed mir-am closed 2 years ago
In the current state on monster, this change will crash + restart the REST API all the time. That is maybe preferable to the current state in which the container keeps running but is actually pumping the log full of errors. With this PR I don't expect the overall problem to go away.
In the current state on monster, this change will crash + restart the REST API all the time. That is maybe preferable to the current state in which the container keeps running but is actually pumping the log full of errors. With this PR I don't expect the overall problem to go away.
This is part of the solution that Sebastian (@proksch ) and I have devised today to solve the issue at both the code and infrastructure levels. Overall, our devised solution is as follows:
PSQLException
and stop the plugins/containers in the new loader and the FASTEN server including the REST API.
monster
unschedulable in K8s to make sure that the node won't be heavily loaded and hence disrupting the Postgres server. Can't k8s be configured to limit its resource usage on a node? Given that many analyses are still running I think not using monster's capabilities might be a set back.
Can't k8s be configured to limit its resource usage on a node? Given that many analyses are still running I think not using monster's capabilities might be a set back.
We can use the node-pressure evicition feature by K8s to terminate pods if the node's CPU or memory usage exceeds a certain threshold.
Description
This PR adds a custom exception handling for the REST API when
PSQLException
occurs. For this exception, the exception handler stops the REST API app/container so that it will be restarted by K8s or DC to create a new PSQL DB connection.Motivation and context
As discussed in #461 and #464, the REST API gives an internal error (500) if its DB connection is closed.
Testing
Tested with the DC.