Open ivyazmitinov opened 1 year ago
Make citus.max_shared_pool_size cluster-wide. Per-database setting then may be used to set a connection quota for a database within the global citus.max_shared_pool_size
Seems reasonable, not sure why we decided to include database in the key.
Make the transaction recovery and distributed deadlock detection respect the improved citus.max_shared_pool_size.
I think as a short-term fix they probably should not count towards max_shared_pool_size, since there is not much to be gained from it.
The longer term fix is to avoid a persistent connection per database. Connections from the maintenance daemon are cached because of how frequently the deadlock detector asks for lock graphs. However, we don't actually need a deadlock detector per database. Just one is enough.
The other (database-specific) operations in the maintenance daemon are much less frequent and could afford open a new connection every time, and either honour max_shared_pool_size (block if none available) or have a separate shared pool.
Thanks for sharing your feedback on this!
not sure why we decided to include database in the key
According to the comment, it is there specificaly to support the per-database setting :slightly_smiling_face:
The longer term fix is to avoid a persistent connection per database. Connections from the maintenance daemon are cached because of how frequently the deadlock detector asks for lock graphs. However, we don't actually need a deadlock detector per database. Just one is enough. The other (database-specific) operations in the maintenance daemon are much less frequent and could afford open a new connection every time, and either honour max_shared_pool_size (block if none available) or have a separate shared pool.
Agree, this improvement looks great! I was more focused on limiting the number of connections to workers than their caching, but remove this unnecessary action will be also beneficial.
either honour max_shared_pool_size (block if none available) or have a separate shared pool.
I would go with a separate shared pool, but introduce an option to configure it as a percentage of max_shared_pool_size
, like max_shared_pool_maintenance_percent
, dedicated to maintenance operations only. This way all the operations will be bound by the actual amount of the connection slots available, but maintenance operations won't compete with user queries.
Thanks for sharing your feedback on this!
I am actually up to a contribution (we suffered a long enough from this), so if you approve, I may start right away :slightly_smiling_face:
either honour max_shared_pool_size (block if none available) or have a separate shared pool.
I would go with a separate shared pool, but introduce an option to configure it as a percentage of max_shared_pool_size, like max_shared_pool_maintenance_percent, dedicated to maintenance operations only. This way all the operations will be bound by the actual amount of the connection slots available, but maintenance operations won't compete with user queries.
The main reason we implemented citus.max_shared_pool_size is to prevent the coordinator establishing more connections than any worker can accept (e.g., citus.max_shared_pool_size < max_connections of worker)
I think making citus.max_shared_pool_size global (e.g., not per database) sounds useful and aligns with why we added it. I think one important caveat here is that citus.max_shared_pool_size cannot/should not be able to set per-database, otherwise things might become complicated. As far as I know, SIGHUP GUCs -- such as shared_pool_size -- cannot be set per database.
When it comes to a separate pool vs some percentage in the pool, I'm very much in favor of the latter. The former could be very hard to implement.
I think even the latter is non-trivial to implement, there are bunch of areas needs to be thought of well, such as make sure that this is not broken: https://github.com/citusdata/citus/blob/main/src/backend/distributed/connection/locally_reserved_shared_connections.c or https://github.com/citusdata/citus/blob/main/src/backend/distributed/executor/adaptive_executor.c#L4707
So, please make sure we have good test coverage involving these areas as well as base scenarios with multiple databases.
The longer term fix is to avoid a persistent connection per database
also, see https://github.com/citusdata/citus/issues/3671 as a reference
I am actually up to a contribution (we suffered a long enough from this), so if you approve, I may start right away 🙂
Great!
Thanks for the useful insights, @onderkalaci!
Before I start, I have one question regarding the deadlock detector:
However, we don't actually need a deadlock detector per database. Just one is enough
I assume, the way to solve this is to have one "management" database exactly for such cluster-wide procedures, and that is being implemented now in the pool_2pc branch?
I assume, the way to solve this is to have one "management" database exactly for such cluster-wide procedures, and that is being implemented now in the pool_2pc branch?
Yes, I think so. The goal of that infrastructure is to be able to sync and track role & database-related commands across the cluster by funneling them all through the management database (even from non-Citus databases). We could then also decide to run the deadlock detector only in the management database.
This issue is more of an invitation to a technical discussion than a solid proposal.
Currently, the main issue with having multiple
DATABASE
s in a cluster is connection management which is not adapted to this. The issue can be split into two parts:citus.max_shared_pool_size
and open at leastn * d
connections per worker, where n -- number of nodes in cluster, d -- number ofDATABASE
scitus.max_shared_pool_size
, this limit is applied perDATABASE
, meaning that it will still requiren * d
connections.In order to overcome this, I propose to:
citus.max_shared_pool_size
cluster-wide. Per-database setting then may be used to set a connection quota for a database within the globalcitus.max_shared_pool_size
citus.max_shared_pool_size
.Since those changes make sense mostly for the multi-database setup, they may be enabled by a single GUC, like
citus.multi_database
, or have a separate GUC for the behaviour ofcitus.max_shared_pool_size
, transaction recovery, and distributed deadlock detection.