Open jasonmp85 opened 6 years ago
It's slightly vexatious to have another representation than libpq
s, but maybe okay considering there are other non-libpq related options at stake and one representation may be better than two.
How's the progress on this one coming? Ran into this one, as I thought I could create another user and use that as a connection but I guess what didn't work. (or maybe I was doing something wrong)
Citus has long lacked a facility for specifying much of anything other than a host and port name for worker machines. While this simple
<host, port>
approach works for the simplest of cases, when more complex addressing systems, SSL configuration, authentication schemes/rotations, special routing for connection pools vs not, and non-trivial host name generation requires effort from end-users to implement, resulting in storage of these concerns in a manner external to Citus and often brittle.For instance, deployments needing passwords must write software to deploy and maintain
pgpass
files in specific locations on the filesystem. If they become out-of-sync with what is required by the remote nodes, connections establishment will fail. Debugging this can be difficult, given the need to inspect thepgpass
files to determine what is wrong (or to verify that they are even being picked up at all). This is all without even considering the risk of storing passwords in plain text on a filesystem.Proposal Details
To address these needs, we propose extending Citus' node metadata to allow arbitrary
libpq
connection parameters on a per-node basis. To avoid having to uselibpq
's query string parsing functions (and to use better structure, generally) we plan to use theJSONB
type for this column.Additionally, we will extend the dictionary of
libpq
parameters with some Citus-specific ones to address certain needs (i.e. addressing a node directly or through a connection pool such aspgbouncer
). Further parameters may be added as common pain points continue to be identified.Because of the sensitivity of authentication information, this new column must not contain passwords. Besides, typical deployments assign passwords on a
<user, host>
-unique basis, not simply by host. Due to this, a superuser-only table will be added to store such information. Rows will reference an existingpg_dist_node
row and adduser
andpassword
information. In keeping with the spirit of existingpgpass
functionality,*
will be accepted in theuser
field as a wildcard match for all users (a similar accommodation could be made to allow a given user to have the same password on all hosts).Finally, we have come across deployments where the host and user name used during connection establishment do not match the simple values available at runtime (i.e. the user name differs from PostgreSQL's internal
GetUserId
). We have yet to find a case where a simple templating mechanism could not address these concerns. As such, fields where the value might be a host or user name will accept a simple templating system in which%%HOSTNAME
and%%USERNAME%%
will be replaced by the worker node name (as specified inpg_dist_node
) and current user name (as calculated by our existingCurrentUserName
method).Implementation Concerns
The question arose: do we replicate all of this new information to schema nodes in an MX deployment? We've decided for now to only replicate the non-sensitive connection options, requiring users to manually propagate their password information. If this is too unexpected, we can always simply change to propagate both sets of information.
Unrelatedly, within Citus we do a lot of work to ensure connections can be looked up within a cache to mitigate the impact of connection establishment time on performance. At present, connections are looked up by
<host, port, user>
, but this will be insufficient if we start differentiating between pooler and direct connections to a given host. Simply adding more fields to the key should be sufficient.Tasks
We expect this work to take longer than one week to complete, but less than a full two. How familiar the implementor is with the existing metadata/connection access/caching may affect implementation time in one way or the other.
connopts
column topg_dist_node
— Schema change and access logic. Ensure upgrade path from existing deployments and add logic to ban certain keys (such as password). Two dayspgpass
users. Two daysStretch/Later
Some additional functionality can wait for later.
pgbouncer
/direct modes — Have to figure out which operations (COPY
, rebalance, replication, even DDL?, etc.) need direct access to nodes and which can go throughpgbouncer
. Add new optional param to specify how to access pooler separate from direct connectionsdocker-compose
to use this new functionality (generate a random password at startup and use that) rather than using non-authenticated connections.Gotchas
I believe existing tests should be able to function just fine without modification, but some more complex multi-user tests might require some changes to operate under the new system.