Closed tchoutri closed 11 months ago
As a PR lgtm.
I'm not entirely sure it fixes the root issue, which is that an exception on the client side causes the process to hang with no output regarding the cause. Though maybe we should create another issue so at least if it comes up again in another form, it's documented.
@RaoulHC I think the root issue might be at the pg driver level, which is far below decks for us. :/
@RaoulHC I think the root issue might be at the pg driver level, which is far below decks for us. :/
Given the exceptions are caught on the server side my hunch is its in poolboy where the exceptions getting lost but I'm by no means certain.
We can perhaps ask @blackheaven directly, for poolboy. :)
poolboy
is an in-memory queue system, jobs are executed in dedicated worker threads, that's why you cannot catch it from the main thread.
I'm not sure it's related to your PR intent, but, IIRC, when I have introduced it, I have set the number of workers to the number on PostGreSQL connections, maybe it would be worth decoupling them.
I looked into it a bit more into it and it seems the hanging is to do with the empty logging function, though I'm not entirely certain or sure why.
I tried to debug our case by using the simpleSerializedLogger
and was unable to reproduce it with the logging though I still could without. Changing the default log function in poolboy as follows causes it to output the error and stop, rather than silently hang, but something strange is going on.
- log = \_ -> return ()
+ log = \x -> seq x $ return ()
Thanks, I'll do it asap
I've gone down a bit of a rabbit hole of concurrency and exceptions today, ended up swapping out the logging function in poolboy for using log-base
so that I could share the same logging framework as flora and try to debug it better.
The previous suggestion seems to improve the situation but not entirely fix it. For reasons I can't quite figure out the exception can end up getting raised outside of the tryAny
in the worker thread, changing to using the unliftio
package is a bit better and the exception can be propagated to the main thread more often, but it still seems to slip past this and not get caught, and I'm not entirely sure how to ensure it's caught in the worker thread. Doesn't help that turning on profiling and changing optimisation seems to change the behaviour of it, seems like it's some subtle result of laziness.
I've done a release
It is integrated in Horizon Haskell
fixes #460
Proposed changes
Lower the db pool connections number to 50 in order to avoid saturating the DB server when both the application and an import job are running simultaneously.