GoogleCloudPlatform / cloud-sql-go-connector

A Go library for connecting securely to your Cloud SQL instances
Apache License 2.0
110 stars 29 forks source link

How to avoid API rate limits with a high number of connection pools? #807

Closed GusPrice closed 5 months ago

GusPrice commented 5 months ago

Question

When I first implemented cloudsqlconn I ran into API rate limiting issues and had to rollback my changes. I am looking for advice or tips on how to best handle the following set up:

  1. Multi tenant environment, each tenant is a database on a postgres cloud sql instance. Currently ~150 tenants, but that number is expected to keep growing.
  2. Our primary Cloud Run deployment generally has about 5 instances
  3. PGX Pool per tenant with a minimum connection count of 2

Currently, on application start, we create a new pool for each tenant that connects specifically to its database on the instance using its own specific db user and password. This means that we have 150 5 (2 maybe depending on if the certs are cached) requests minimum (750 requests) when we do a new deployment. We have additional requests around the same time for one time actions on deployment, like database migrations. So we quickly blew through the 1000 requests/min api limit.

I understand that the connector has caching for the certificate information etc, but is there any way to share that across connector instances/pgxpools? I'm struggling to see how we could use this tool in our current architecture or how to change our architecture to a better configuration.

Code

No response

Additional Details

I left out any code because our usage is more or less a direct copy paste of the examples in this repo.

enocom commented 5 months ago

A few questions:

GusPrice commented 5 months ago

Thanks for the questions!

func openDB(ctx context.Context, dsn, connName string, opts ...PgxPoolOption) (*sql.DB, error) {
    // create a pgxpool config and apply options and config
    cfg, err := pgxpool.ParseConfig(dsn)
    if err != nil {
        return nil, err
    }

    for _, opt := range opts {
        opt(cfg)
    }

    // Create a new dialer with option for private IP
    d, err := cloudsqlconn.NewDialer(ctx, cloudsqlconn.WithDefaultDialOptions(cloudsqlconn.WithPrivateIP()))
    if err != nil {
        return nil, err
    }

    // Tell the driver to use the Cloud SQL Go Connector to create connections
    cfg.ConnConfig.DialFunc = func(ctx context.Context, _ string, instance string) (net.Conn, error) {
        return d.Dial(ctx, connName)
    }

    // instantiate the pgxpool from the config
    connPool, err := pgxpool.NewWithConfig(ctx, cfg)
    if err != nil {
        return nil, errors.Wrap(err, "call to pgx.NewConnPool failed")
    }

    // retrieve a *sql.DB from the pgxpool
    db := stdlib.OpenDBFromPool(connPool)

    return db, nil
}

This function get's called in a for loop across our tenants and makes a map:

    for _, t := range tenants {
        connectionMap[t.ID] = openDB(ctx, t.DSN, connName)
    }

we pass use this connection map to select an appropriate connection based on tenant being accessed.

enocom commented 5 months ago

Currently creating one dialer per tenant/pgxpool. Would it be reasonable to share the dialer across pools, that seems like it might be the solution here if it stores the certs by instance and not DSN (since our DSNs are different)

Yes -- that's your best option here. The dialer caches client certificates on a per instance basis, not a per database basis. So you'll immediately solve your problem this way. You can continue to configure your pgxpools to call through to d.Dial to keep the pool per tenant design.

GusPrice commented 5 months ago

Ah, wow, love when it's just a super easy solution. I will give this new approach a whirl in the next couple days and come back to confirm success. Thank you!

enocom commented 5 months ago

Let me know how that goes. We can explore additional options if you hit quota issues again.

GusPrice commented 5 months ago

Sorry two more tangentially related questions:

  1. I noticed in another open issue someone had deferred closing the dialer, but I don't see that in the examples page. Is that something I should be doing?
  2. I also noticed in the examples sslmode is always set to disable(). Is that due to fully relying on this package to handle that?
enocom commented 5 months ago
  1. Yes, but only in your main function or similar. When you close the dialer, it stops accepting new connections and stops the background certificate refresh goroutines.
  2. Yes, exactly. Postgres drivers by default use "prefer" which means they try SSL first, and then if that fails, fall back to disable. Since the Dialer is creating the TLS connection for you, there's no reason to make the driver try an SSL connection. And so instead, it's best to just skip straight to "disable."
GusPrice commented 5 months ago

Live with the shared dialer and things seem to be going much more smoothly. Down to ~50 requests during service deploy (across a variety of services) from what I think peaked a bit over 5000 before.

Thanks again for the help!

enocom commented 5 months ago

Happy to hear it! :clap: