citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.43k stars 662 forks source link

Connection Configuration Improvements #2019

Open jasonmp85 opened 6 years ago

jasonmp85 commented 6 years ago

Citus has long lacked a facility for specifying much of anything other than a host and port name for worker machines. While this simple <host, port> approach works for the simplest of cases, when more complex addressing systems, SSL configuration, authentication schemes/rotations, special routing for connection pools vs not, and non-trivial host name generation requires effort from end-users to implement, resulting in storage of these concerns in a manner external to Citus and often brittle.

For instance, deployments needing passwords must write software to deploy and maintain pgpass files in specific locations on the filesystem. If they become out-of-sync with what is required by the remote nodes, connections establishment will fail. Debugging this can be difficult, given the need to inspect the pgpass files to determine what is wrong (or to verify that they are even being picked up at all). This is all without even considering the risk of storing passwords in plain text on a filesystem.

Proposal Details

To address these needs, we propose extending Citus' node metadata to allow arbitrary libpq connection parameters on a per-node basis. To avoid having to use libpq's query string parsing functions (and to use better structure, generally) we plan to use the JSONB type for this column.

Additionally, we will extend the dictionary of libpq parameters with some Citus-specific ones to address certain needs (i.e. addressing a node directly or through a connection pool such as pgbouncer). Further parameters may be added as common pain points continue to be identified.

Because of the sensitivity of authentication information, this new column must not contain passwords. Besides, typical deployments assign passwords on a <user, host>-unique basis, not simply by host. Due to this, a superuser-only table will be added to store such information. Rows will reference an existing pg_dist_node row and add user and password information. In keeping with the spirit of existing pgpass functionality, * will be accepted in the user field as a wildcard match for all users (a similar accommodation could be made to allow a given user to have the same password on all hosts).

Finally, we have come across deployments where the host and user name used during connection establishment do not match the simple values available at runtime (i.e. the user name differs from PostgreSQL's internal GetUserId). We have yet to find a case where a simple templating mechanism could not address these concerns. As such, fields where the value might be a host or user name will accept a simple templating system in which %%HOSTNAME and %%USERNAME%% will be replaced by the worker node name (as specified in pg_dist_node) and current user name (as calculated by our existing CurrentUserName method).

Implementation Concerns

The question arose: do we replicate all of this new information to schema nodes in an MX deployment? We've decided for now to only replicate the non-sensitive connection options, requiring users to manually propagate their password information. If this is too unexpected, we can always simply change to propagate both sets of information.

Unrelatedly, within Citus we do a lot of work to ensure connections can be looked up within a cache to mitigate the impact of connection establishment time on performance. At present, connections are looked up by <host, port, user>, but this will be insufficient if we start differentiating between pooler and direct connections to a given host. Simply adding more fields to the key should be sufficient.

Tasks

We expect this work to take longer than one week to complete, but less than a full two. How familiar the implementor is with the existing metadata/connection access/caching may affect implementation time in one way or the other.

Stretch/Later

Some additional functionality can wait for later.

Gotchas

I believe existing tests should be able to function just fine without modification, but some more complex multi-user tests might require some changes to operate under the new system.

fdr commented 6 years ago

It's slightly vexatious to have another representation than libpqs, but maybe okay considering there are other non-libpq related options at stake and one representation may be better than two.

cyraid commented 3 months ago

How's the progress on this one coming? Ran into this one, as I thought I could create another user and use that as a connection but I guess what didn't work. (or maybe I was doing something wrong)