GoogleCloudPlatform / cloud-sql-python-connector

A Python library for connecting securely to your Cloud SQL instances.
Apache License 2.0
286 stars 68 forks source link

Consider support for psycopg3 driver. #219

Open jackwotherspoon opened 2 years ago

jackwotherspoon commented 2 years ago

Monitor interest and adoption of psycopg3 driver for potential support with Cloud SQL connector.

khaerulumam42 commented 2 years ago

very interesting to watch this development, thank you team!

jackwotherspoon commented 1 year ago

Re-commenting what I commented on #214 as it also applies to psycopg3

FYI to everyone: it is possible to connect with automatic IAM authentication without the need for the Python Connector:

You'll need to ensure a few things:

We're working on making this path easier for folks, but for now I'll share the mechanics for visibility.

Assuming you're using SQLAlchemy, you can do this:

import sqlalchemy
from sqlalchemy import event

import google.auth
from google.auth.transport.requests import Request

# initialize ADC creds
creds, _ = google.auth.default(
    scopes=["https://www.googleapis.com/auth/sqlservice.login"]
)

# Cloud SQL Instance IP address
instance_ip = <INSTANCE_IP> 

# create SQLAlchemy connection pool
# use Cloud SQL Instance IP + native port (5432)
# for best security use client certificates + server cert for SSL
engine = sqlalchemy.create_engine(
    f"postgresql+psycopg://service-account@project-id.iam:empty-pass@{ip_address}:5432/dbname",
    connect_args={'sslmode': 'require'},
)

# set do_connect event listener
# This gets called before a connection is created and allows you to
# refresh the OAuth2 token here as needed
@event.listens_for(engine, 'do_connect')
def auto_iam_authn(dialect, conn_rec, cargs, cparams):  
    # refresh credentials if expired
    if not creds.valid:
        request = Request()
        creds.refresh(request)

    # use OAuth2 token as password
    cparams["password"] = str(creds.token)

# interact with Cloud SQL database using connection pool
with engine.connect() as conn:
    # query database
    time = conn.execute(sqlalchemy.text("SELECT NOW()")).fetchone()
    print(time[0])
Radolumbo commented 7 months ago

FYI for anyone looking to make use of psycopg and who wants to continue relying on Google's Cloud SQL Auth Proxy and not deal with VPCs, there is a way!

I had so many problems using pg8000 with Google Cloud SQL. Since making the switch to psycopg, I haven't had a single database connection error and performance is significantly better.

Thank you so much to @jackwotherspoon for his IAM solution, I would have never gotten to this without it! I found a way to use the auth proxy with that, along with making use of unix sockets as documented in https://cloud.google.com/sql/docs/postgres/connect-run (when you have the Cloud SQL connection set up for Cloud Run, it exposes the DB via unix sockets, which is what I am assuming the connector libraries leverage under the hood). The nice thing about doing it this way is it also makes connecting to your DBs from your local machine easy as well, as you can use cloud-sql-proxy with a -u /cloudsql flag to expose the unix socket.

This is basically how I ended up solving it (building a singleton so that a single engine is shared across my service):

import os
import threading

import google.auth
from google.auth.transport.requests import Request
import sqlalchemy
from sqlalchemy import event
from sqlalchemy.orm import Session, sessionmaker

class PostgresDBEngineProvider():
    """
    This class is used to create a singleton instance of the sqlalchemy engine.
    To use it, the following environment variables must be set:
    - CLOUD_SQL_CONNECTION_NAME
    - CLOUD_SQL_USER
    """

    _instance: "DBEngineProvider" = None
    _lock = threading.Lock()
    _iam_creds = None

    def __new__(cls) -> "DBEngineProvider":
        with cls._lock:
            if cls._instance is None:
                cls._instance = super().__new__(cls)
                cls._iam_creds, _ = google.auth.default(
                    scopes=["https://www.googleapis.com/auth/sqlservice.login"]
                )
                cls._instance._engine = cls.__create_engine()
                cls._instance._sessionmaker = sessionmaker(bind=cls._instance._engine)
        return cls._instance

    def get_engine(self) -> sqlalchemy.engine.base.Engine:
        """
        Returns the singleton instance of the sqlalchemy engine.
        """
        return self._instance._engine

    def get_session(self) -> Session:
        """
        Returns a new session from the singleton instance of the sqlalchemy engine.
        """
        return self._instance._sessionmaker()

    # This gets called before a connection is created and allows you to
    # refresh the OAuth2 token here as needed
    # Part of a workaround for using IAM authentication with psycopg
    # as described in https://github.com/GoogleCloudPlatform/cloud-sql-python-connector/issues/219
    @classmethod
    def __auto_iam_authn(cls, **kwargs):
        if not cls._iam_creds.valid:
            request = Request()
            cls._iam_creds.refresh(request)

        # use OAuth2 token as password
        kwargs["cparams"]["password"] = str(cls._iam_creds.token)

    @classmethod
    def __create_engine(cls) -> sqlalchemy.engine.base.Engine:
        engine = sqlalchemy.create_engine(
            # "empty-pass" is a placeholder for the password, which will get set
            # dynamically by the do_connect event listener
            sqlalchemy.engine.url.URL(
                drivername="postgresql+psycopg",
                username=os.getenv("CLOUD_SQL_USER"),
                password="empty-pass",
                host=None,
                port=None,
                database="my-db",
                query={
                    "host": f"{os.getenv('CLOUD_SQL_UNIX_SOCKET_ROOT', '/cloudsql')}/{os.getenv('CLOUD_SQL_CONNECTION_NAME')}"
                },
            ),
            # See https://docs.sqlalchemy.org/en/20/core/pooling.html#disconnect-handling-pessimistic 
            pool_pre_ping=True,
        )
        event.listen(engine, "do_connect", cls.__auto_iam_authn, named=True)

        return engine

Where CLOUD_SQL_CONNECTION_NAME is something like my-project:us-central1:my-db and CLOUD_SQL_USER is something like my-service-account@my-project.iam.

You'll want to install psycopg[binary,pool] in your requirements, btw. Using binary in the requirements rather than c isn't ideal, but I'm relying on buildpacks via Google's build submit functionality to create my images and could not find an easy way to get it to install the necessary libraries for using the c dependency.

Also note for local development that if you're using macOS, you cannot create directories at the root level. To get around this, you can set the unix socket elsewhere, e.g. /Users/Shared/cloudsql. Then, when running applications that need to access Cloud SQL, set the env var CLOUD_SQL_UNIX_SOCKET_ROOT to that directory.

jackwotherspoon commented 7 months ago

@Radolumbo Thanks for another great solution! This is a cool one. We will actually be adding support for psycopg2 and psycopg3 to the Python Connector in the coming months using a solution very similar to what you have mentioned the Proxy is doing 😄

We will be having the Python Connector create the unix socket and use it to connect which is what the Proxy sets up when you add a "Cloud SQL Connection" to Cloud Run.

Hopefully when we add this feature to the Python Connector we can make your life even easier to connect to Cloud SQL with psycopg.

colinrsmall commented 3 months ago

We will actually be adding support for psycopg2 and psycopg3 to the Python Connector in the coming months

Are we getting close to this? Appreciate all the effort you guys are putting in! 😁

jackwotherspoon commented 3 months ago

We will actually be adding support for psycopg2 and psycopg3 to the Python Connector in the coming months

Are we getting close to this? Appreciate all the effort you guys are putting in! 😁

Yes I am currently working on the design internally and should have a PR up soon 🤞

kosty commented 1 month ago

@jackwotherspoon Thanks for all the great work answering those questions on GitHub, very much appreciated. Just adding my 2 cents here - would be really excited to see psycopg3 land.

jackwotherspoon commented 1 month ago

@jackwotherspoon Thanks for all the great work answering those questions on GitHub, very much appreciated. Just adding my 2 cents here - would be really excited to see psycopg3 land.

Thanks @kosty, we are currently done the preliminary design. We are leaning towards creating a local unix socket for psycopg connections, similar to how the Cloud SQL Proxy operates. This would actually allow almost any database driver to become supported via the Python Connector. Hoping to begin the dev work here shortly.