Closed pbolduc closed 1 year ago
Edit: Looking back at the history, this appears to be the only recent occurrence.
We see the COMS pod reports unhealthy during this time.
Hey Phil. I'm looking into these timeouts this week. I suspect it may be a combination of sql queries and the way we handle unresolved javascript promises. Are you able to let me know the resource allocation to your Patroni cluster (eg: cpu and memory, number of pods), or if you use a single postgres pod?
Is it mainly the searchObjects api call that gives timeouts under high load?
Thanks
Closing this issue as the error conditions described are not repeatable. That being said, we did do an internal performance review pass and added a few indexes to the permission tables to improve lookup speed in #162 . Should a similar issue appear again in the upcoming COMS v0.4.1 release, please feel free to request this issue to be reopened or to file a new issue.
Describe the bug
We are seeing issues with client time outs. This is the same situation as closed issue #134. During high query activity searching for files, we see client time outs. The client's default time out is 100 seconds. Calls to the COMS service never returns in various cases. and because it never returns, nothing is logged in the COMS console of the pods. Tracking it down, the most likely situation would be unresolved promises. This lead me to look at where promises are being resolved. I am not very proficient JavaScript, so I may not correct.
The problem definitely occurs when the system is under load. For example, when we are getting about 20-30 requests/second to COMS, we start to get time outs,
In the various services that query the database, the
then
handler does not provide a errorHandler parameter. By not handling the errors, the service could return SQL and other information about the application. See Error handling in the Objection documenation.Additionally it seems, the
searchObjects
controller function does not map the error usingerrorToProblem
like other methods.Version Number
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Screenshots
Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context