Closed fishcakez closed 6 years ago
This looks great! What is the plan here? Do we plan to remove support for at least the Sojourn pool then? I think we should come up with a plan to remove code before adding even more. :)
I also recall you saying this supports ownership out of the box. Should we force this pool to be used whenever we want the ownership style tests?
What is the plan here?
There isn't a plan here yet, just that I knew we could have a faster pool.
Do we plan to remove support for at least the Sojourn pool then?
I would like to do this very much. I would actually like to remove all the pools and just use this one and ownership. Or some other big simplification.
An alternative would be to always supervise connections away from the actual pool process like this so that most of the logic is the same shared off-tree worker, with light messaging passing/queuing loop in the pooled worker. By connecting in another process we can improve the queuing in all existing pools (except Sojourn that already does it). I think it would also allow us to simplify Ownership by not having a nested pool and always running pool_size
proxy processes.
I also recall you saying this supports ownership out of the box. Should we force this pool to be used whenever we want the ownership style tests?
🤔 this doesn't actually work with ownership because of https://github.com/elixir-ecto/db_connection/issues/106 (nb this is issue trivial to solve for most cases by having a watcher per pool type). If we stopped allowing custom pool behaviors it could work by using the same message passing semantics, i.e. contract on messages rather than functions. This would also solve the issue of having to pass pool
option.
I would like to do this very much. I would actually like to remove all the pools and just use this one and ownership. Or some other big simplification.
:+1:
If we stopped allowing custom pool behaviors it could work by using the same message passing semantics
:+1:
We should add an expire_timeout
or similar option that disconnects a connection on checkin/poll if it is older than a timeout.
Today @DavidAntaramian asked on slack about gathering some statistics about the pool - in particular stats about saturation and load of the pool for optimising and monitoring the application.
If we're designing a new pool, maybe it would make sense to also include some APIs to gather this data or at least provide some hooks so it could be integrated with some metrics library?
Also, regarding overflow. I think for database connections actually a way to dynamically change the pool size rather than poolboy-style overflow would be more useful, because of the connection cost.
If we had a way to gather metrics, it would become possible to have a component regulating the pool size (this could be integrated as an external library), growing it during overload and decreasing during quiet time, but smoothly.
If we're designing a new pool, maybe it would make sense to also include some APIs to gather this data or at least provide some hooks so it could be integrated with some metrics library?
Yes definitely but for metrics we can worry after the implementation I think. It will be easier to do once we have removed poolboy/sojourn, we do not need to get ahead of ourselves :D.
If we had a way to gather metrics, it would become possible to have a component regulating the pool size (this could be integrated as an external library), growing it during overload and decreasing during quiet time, but smoothly.
This is what the Sojourn pool does already FWIW (and provides hookable metrics). I am slightly :-1: on any overflow feature to be honest because of the complexity.
@michalmuskala I removed the monitoring as its not required, unsure how minimal the increase in performance is but it simplifies the code slightly.
@fishcakez I got some failures when I tried the changes:
07:34:36.676 [error] GenServer #PID<0.187.0> terminating
** (CaseClauseError) no case clause matching: {-576460751509, #Reference<0.4040485667.3719692292.85506>}
(db_connection) lib/db_connection/connection_pool.ex:155: DBConnection.ConnectionPool.handle_info/2
(stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:686: :gen_server.handle_msg/6
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: {:timeout, #Reference<0.4040485667.3719561220.85547>, {-576460750509, -576460751509}}
State: {:ready, #Reference<0.4040485667.3719692292.85497>, %{delay: 0, idle: #Reference<0.4040485667.3719561220.85548>, idle_interval: 1000, interval: 1000, next: -576460750509, poll: #Reference<0.4040485667.3719561220.85547>, slow: false, target: 50}}
Oh there was a subtle bug in the old code that didn't get realized until that change.
Here are some benchmarks with raw db_connection, without actually connecting to a database:
https://gist.github.com/michalmuskala/e503d650718d122fbdb478372b83c46a
And here are some with postgres connecting to a real database:
https://gist.github.com/michalmuskala/00d163ddd0c7eff1770b27cd28ad8dfb
Any updates?
I've moved to raw message instead of GenServer call. This would be ready to merge, and then work on changes to support the raw message, instead of option in later PRs
Unique integer should be fast if it is not monotonic. Basically a simple counter per scheduler. --
José Valimwww.plataformatec.com.br http://www.plataformatec.com.br/Founder and Director of R&D
Merging to unblock work on moving ownership to use this pool. We can improve this pool further later on.
Adds a built in connection pool that will increase throughput by 3-7% for most single queries and load levels in Ecto. Other workloads may get 0%-10% improvements.
The pool uses an ETS queue and some other ETS trickier to avoid tracking busy connections/clients. The later feature (if this was only pool style) would allow us to stop using the pdict so that we can ensure no data leaks for non-closure transactions.
We might be able to get better performance by removing either or both of these ETS use cases.
IO overload handling is better (and simpler) than Sojourn pool. In future we could improve the case where the database is down or other side of netsplit because we can know when there are no connections.
CPU overload handling is not as good as Sojourn pool but could be improved by limiting the queue size and dropping all new checkouts above that size.
There is no overflow. This is something we could do in future if we have a simple and good algorithm.
Idle ping handling is improved over other pools because pings will never synchronize and happen as rarely as possible.
Timeout handling is improved over other pools because an absolute deadline can be used (useful for multiple repo or multi service calls) and the queue time is included in the timeout if not absolute. Also queue timeouts are
DBConnection.ConnectionError
errors, like Sojourn pool, so easier to handle.Processes hibernation occurs efficiently and does not affect performance. It could be more efficient if hibernate on backoff.
Overhead for
pool_size: 1
is insignificant compared toDBConnection.Connection
so could replace it.