CrunchyData / crunchy-proxy

PostgreSQL Connection Proxy by Crunchy Data (beta)
Apache License 2.0
414 stars 61 forks source link
golang postgresql

= Installation Guide - Crunchy Proxy Crunchy Data Solutions, Inc. v0.0.1, {docdate} image::docs/crunchy_logo.png?raw=true[]

== Project

crunchy-proxy is a minimal PostgreSQL-aware proxy used to handle PostgreSQL application requests. The diagram below depicts a PostgreSQL client application connecting to the crunchy-proxy, which appears to the PostgreSQL application as any other PostgreSQL database connection. The crunchy-proxy accepts the inbound client requests, and routes them to an appropriate PostgreSQL cluster member (as described in more detail below).

image::docs/proxy-diagram.png?raw=true["scaledwidth="50%"]

To the PostgreSQL server, the crunchy-proxy is transparent and appears as any other PostgreSQL client would appear. As described in more detail below, the crunchy-proxy currently provides the following capabilities:

As described under "Approach" below, implementation of the crunchy-proxy requires that (1) a user has access to their application's SQL and (2) a user has the ability to add basic annotations to their SQL statements.

== Approach

One of the key features of the crunchy-proxy is the ability to route different SQL statements to a PostgreSQL cluster members based on certain pre-defined rules. This capability enables end-user applications to see a PostgreSQL cluster as a single connection. In order to support this capability, it is necessary that crunchy-proxy have knowledge of the SQL statements that a PostgreSQL application sends to PostgreSQL.

One approach to addressing the requirement that a proxy be "SQL aware" (e.g. link:http://www.pgpool.net/mediawiki/index.php/Main_Page[pgpool]) is to first parse each SQL statement looking for grammar that would imply a write or update and to then send those SQL statements to the cluster master. SQL statements that are determined to be read-only in nature are deemed safe to send to the cluster replica members. Remember, in PostgreSQL cluster replica members are only able to process SQL that is read-only in nature.

crunchy-proxy takes a different approach to SQL routing. Instead of attempting to parse the entire SQL grammer of each SQL statement, it looks for a simple annotation, supplied by the end-user within their SQL statement, to determine the routing destination.

For example, the following SQL statement is annotated to indicate it is a read-only SQL statement: .... / read / select now(); ....

Examples of a write SQL statement include the following: .... create table boofar (id int); drop table boofar (id int); ....

If a SQL statement does not include an annotation, the the statement is deemed a write and thus sent to the master cluster member.

By parsing only the annotation, crunchy-proxy simplifies the complexity associated with determining whether a SQL statement is a write or read and thus to which member (master or replica) of a PostgreSQL cluser to send a SQL statement.

In taking this approach, crunchy-proxy has assumed (1) a user has access to their application's SQL and (2) a user has the ability to add the annotation in their SQL statements. If they do, then they can use the crunchy-proxy for SQL routing.

Of course these assumptions introduce certain limitations on the crunchy-proxy. Nonetheless, it was determined that these assumptions will not be unduly limiting in the usability of the crunchy-proxy and that the resulting limiations are justified by the benefits of (1) reduction in complexity associated with SQL parsing implementation, (2) increase in proxy throughput and (3) improved routing accuracy of the SQL parsing.

== PostgreSQL Wire Protocol

crunchy-proxy operates at the PostgreSQL wire protocol (network) layer to understand PostgreSQL client authentication requests and SQL statements passed by a client to a PostgreSQL backend.

As crunchy-proxy uses annotations to route messages to the backend, the proxy primarily examines SQL statements for proxy-specific annotations and does very little processing of the messages sent between a client and an actual backend.

Its important to note that the proxy does not implement all features of libpq or provide an application interface similar to a JDBC driver or other language driver.

The following resources are useful in understanding the PostgreSQL wire protocol:

In the future, by working at the wire protocol level, crunchy-proxy can implement a variety of features important for high speed proxy handling and for supporting PostgreSQL features.

== Execution

The proxy is a golang binary, you execute it as follows: .... $> crunchy-proxy start --config=config.yaml ....

To run the proxy at different logging output levels:

.... $> crunchy-proxy start --config=config.yaml --log-level= ....

Where is one of the following:

Detailed documentation including configuration file format and developer information is found in the link:docs/crunchy-proxy-user-guide.asciidoc[User Guide]

For Docker users, you can run the proxy using the link:bin/run-docker.sh[run-docker.sh] script.

== Feedback

If you find a bug, or want to provide feedback on the design and features feel free to create a github issue.

== Legal Notices

Copyright © 2017 Crunchy Data Solutions, Inc.

CRUNCHY DATA SOLUTIONS, INC. PROVIDES THIS GUIDE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Crunchy, Crunchy Data Solutions, Inc. and the Crunchy Hippo Logo are trademarks of Crunchy Data Solutions, Inc.