cyrilgdn / terraform-provider-postgresql

Terraform PostgreSQL provider
https://www.terraform.io/docs/providers/postgresql/
Mozilla Public License 2.0
389 stars 200 forks source link

Operation timed out trying to connect to GCP DB via proxy #45

Open wilhelmi opened 3 years ago

wilhelmi commented 3 years ago

Hi there,

Thank you for opening an issue. Please provide the following information:

Terraform Version

0.13.6

Affected Resource(s)

Terraform Configuration Files

resource "google_sql_database_instance" "isa" {
  name                = var.db_instance
  database_version    = var.db_version
  deletion_protection = var.data_deletion_protection

  settings {
    tier = var.db_tier

    ip_configuration {
      ipv4_enabled    = false
      private_network = google_compute_network.primary.id
    }

  }

  timeouts {
    create = "20m"
    delete = "20m"
  }
}

resource "google_sql_user" "admin" {
  name     = "admin"
  instance = google_sql_database_instance.isa.name
  password = var.admin_db_pw
}

provider "postgresql" {
  username         = google_sql_user.admin.name
  password         = google_sql_user.admin.password
  scheme           = "gcppostgres"
  host             = google_sql_database_instance.isa.connection_name
  port             = 5432
  expected_version = "11.0.0"
}

resource "postgresql_database" "my_db" {
  name              = "my_db"
  owner             = google_sql_user.admin.name
  template          = "template0"
  lc_collate        = "C"
  connection_limit  = -1
  allow_connections = true
}

Debug Output

Expected Behavior

The new database is created, via cloud_sql_proxy tunnel provided by the go cloud lib.

Actual Behavior

postgresql_database.my_db: Still creating... [1m10s elapsed]

Error: Error creating database "my_db": dial tcp 10.35.0.3:3307: connect: operation timed out

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

Important Factoids

I can connect to the DB via local cloud_sql_proxy. DB does not have a public IP.

nadirsidi commented 3 years ago

EDIT: My issue ended up being due to network access control list (NACL) settings that were blocking the traffic between the subnet with my CodeBuild project and the subnet with RDS Aurora. My mistake!


I am seeing similar behavior when trying to connect to an RDS Aurora cluster endpoint. I get the "Still creating..." message until my project in AWS CodeBuild eventually times out.

Here is my provider configuration. Perhaps there is an issue with connecting via the GoCloud? I know this is new in version 1.11.0.

provider "postgresql" {
  scheme = "awspostgres"
  host = data.aws_rds_cluster.data_science_postgresql.endpoint
  port = data.aws_rds_cluster.data_science_postgresql.port
  database = data.aws_rds_cluster.data_science_postgresql.database_name
  username = data.aws_rds_cluster.data_science_postgresql.master_username
  password = jsondecode(data.aws_secretsmanager_secret_version.master_db_credentials.secret_string)["password"]
  superuser = false
}

Like @wilhelmi, I am also connecting to a DB that only has a private DNS name (in a private subnet), and I am running Terraform from a container that is running in AWS CodeBuild in the same subnet.

I have combed through the TF_LOG=TRACE output, but I don't see any messages about connection issues. My RDS cluster also does not register a connection on the DB side.

cyrilgdn commented 3 years ago

@wilhelmi Thanks for opening the issue.

It will be hard for us to debug your problem as it seems to be more network configuration problems. Did you get new informations about this problem?

But I'll try to execute your example as soon as I can to see if it works no my side.

cfloody commented 3 years ago

I am also getting the exact same error. I have an existing database cluster that was created successfully from Terraform:

resource "aws_db_instance" "internal_pg" {
  allocated_storage         = 20
  db_subnet_group_name      = aws_db_subnet_group.internal_pg.name
  vpc_security_group_ids    = [aws_security_group.internal_pg.id]
  storage_type              = "gp2"
  engine                    = "postgres"
  engine_version            = "12.5"
  instance_class            = "db.m6g.large"
  name                      = "db_pg_us_west_2_staging"
  identifier                = "db-pg-us-west-2-staging"
  final_snapshot_identifier = "deleted-db-pg-us-west-2-staging"
  username                  = "db_admin"
  password                  = "password"
  parameter_group_name      = "postgres12"
}

And then when I try to create an additional postgres db by appending to the same Terraform file:

provider "postgresql" {
  host             = aws_db_instance.internal_pg.address
  port             = aws_db_instance.internal_pg.port
  username         = aws_db_instance.internal_pg.username
  password         = aws_db_instance.internal_pg.password
  expected_version = aws_db_instance.internal_pg.engine_version
  sslmode          = "require"
}

resource "postgresql_database" "internal_pg" {
  name = "another_db"
}

And running terraform apply:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # postgresql_database.internal_pg will be created
  + resource "postgresql_database" "internal_pg" {
      + allow_connections = true
      + connection_limit  = -1
      + encoding          = (known after apply)
      + id                = (known after apply)
      + is_template       = (known after apply)
      + lc_collate        = (known after apply)
      + lc_ctype          = (known after apply)
      + name              = "another_db"
      + owner             = (known after apply)
      + tablespace_name   = (known after apply)
      + template          = (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions in workspace "internal-stg"?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

postgresql_database.internal_pg: Creating...
postgresql_database.internal_pg: Still creating... [10s elapsed]
postgresql_database.internal_pg: Still creating... [20s elapsed]
postgresql_database.internal_pg: Still creating... [30s elapsed]
postgresql_database.internal_pg: Still creating... [40s elapsed]
postgresql_database.internal_pg: Still creating... [50s elapsed]
postgresql_database.internal_pg: Still creating... [1m0s elapsed]
postgresql_database.internal_pg: Still creating... [1m10s elapsed]

Error: Error creating database "another_db": dial tcp 10.232.57.183:5432: connect: operation timed out

  on main.tf line 175, in resource "postgresql_database" "internal_pg":
 175: resource "postgresql_database" "internal_pg" {

And my providers.tf file looks like:

terraform {
  backend "local" {
  }
  required_providers {
    postgresql = {
      source = "cyrilgdn/postgresql"
      version = "1.11.1"
    }
  }
}
cyrilgdn commented 3 years ago

Hi @cfloody ,

From where are your running Terraform? 10.232.57.183 is a private IP, are you running it from within the VPC?

nadirsidi commented 3 years ago

@cfloody, I ran into this issue last week and my set-up is also in a private subnet. For me, I had to adjust my NACL to allow traffic between the private subnet running Terraform and the private subnet where I launched my RDS cluster. You should also check your security group rules and route tables. Any of those three things could potentially be blocking the traffic, in which case you'll see the behavior you're seeing. I was able to troubleshoot this was a network issue by trying to connect with psql, tweaking the network settings until that worked, then returning to the Terraform stuff.

For reference, here is the provider configuration that works for me. Note, I am using aurora postgres serverless. I am also reading the RDS connection information from a data provider because I launched my cluster from a different Terraform configuration.

provider "postgresql" {
  scheme = var.pg_scheme  # postgres provider 1.11+ specifies awspostgres for RDS, but the GO CDK doesn't verify the x509 cert correctly
  host = data.aws_rds_cluster.data_science_postgresql.endpoint
  port = data.aws_rds_cluster.data_science_postgresql.port
  database = data.aws_rds_cluster.data_science_postgresql.database_name
  username = jsondecode(data.aws_secretsmanager_secret_version.master_db_credentials.secret_string)["username"]
  password = jsondecode(data.aws_secretsmanager_secret_version.master_db_credentials.secret_string)["password"]
  superuser = false
  sslmode = "require"
  sslrootcert = "/code/manifest/AmazonRootCA1.pem"
  expected_version = "10.12"
}

You can get the AWS Root Cert from the RDS documentation here, https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html. I found this worked better than letting the GO CDK do it, because that was returning x509 cert errors. This seems to be a GO-AWS thing, but folks are working on it.

wilhelmi commented 3 years ago

I am running this from a local workstation outside of GCP, the target in this case. From my reading of the gocloud library it should be using the cloud-sql-proxy lib to connect to the DB. This should connect to the DB even though it only has a private IP. I can connect to the target DB from the same machine if I start cloud-sql-proxy from the CLI. So I think the network route works. One thing that jumped at me was the 3307 port when 5432 was specifically configured.

carlpett commented 3 years ago

Also running into this. From what I can tell the port being not 5432 is ok (although I initially reacted to it as well): https://github.com/GoogleCloudPlatform/cloudsql-proxy/blob/ec20300fbae3d11e93a9cc62946463632f3d9773/proxy/proxy/client.go#L84-L86 Unlike you @wilhelmi, I can't connect with a local cloud-sql-proxy either, I don't have a route for the private IP that is used. Do you have a VPN that includes a route to the private network IP space?

itspngu commented 3 years ago

I've been battling this all day as well. However, like @carlpett, I had no luck connecting with the cloud_sql_proxy as long as the instance does not have a public v4 address. As per

https://cloud.google.com/sql/docs/postgres/connect-admin-proxy#authentication-options

it also seems that "IAM user database authentication is not currently integrated with the Cloud SQL Auth proxy", which might be the root of my issues. Anyway, assigning a v4 (can be ephemeral) to the instance allows the provider to access the database¹ - in that case I also don't need to set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is great because ² the Proxy (unlike gogloud/terraform-provider-postgresql) does not support passing the contents of a key file rather than a path:

https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/323

While this doesn't solve the issue itself, I was able to work around it with the following (abridged) setup:

# terraform-service-account is a GCP service account with a bunch of scopes, for Cloud SQL & Networking we need at least:
# - roles/cloudsql.admin
# - roles/compute.networkAdmin

$ TF_VAR_GOOGLE_CREDENTIALS="$(cat terraform-service-account-key.json)" terraform <command>

The relevant bits from my .tf files:

provider "google-beta" {
  credentials = var.GOOGLE_CREDENTIALS
  # [...]
}

data "google_project" "default" {}

resource "google_compute_network" "default" {
  provider = google-beta

  name                    = "default"
  # [...]
  auto_create_subnetworks = true
}

resource "google_compute_subnetwork" "default" {
  provider = google-beta

  name    = "default"
  region  = "europe-west3"
  network = google_compute_network.default.id
  ip_cidr_range            = "10.156.0.0/20"
  purpose                  = "PRIVATE"
  private_ip_google_access = true
}

provider "postgresql" {
  alias = "pg-main"

  host     = google_sql_database_instance.pg-main.connection_name

  # I tried using a service account here, didn't work. Fell back to the "postgres" default user, which does work.
  username = google_sql_user.pg-main-postgres.name
  password = google_sql_user.pg-main-postgres.password

  scheme           = "gcppostgres"
  superuser        = false
  expected_version = var.POSTGRES_VERSION
}

resource "google_sql_database_instance" "pg-main" {
  provider = google-beta

  name                = "pg-main"
  database_version    = "POSTGRES_${var.POSTGRES_VERSION}"
  # [...]

  settings {
    ip_configuration {
      ipv4_enabled    = true
      require_ssl     = true
      private_network = data.google_compute_network.default.id
    }
    # :(
    database_flags {
      name  = "cloudsql.iam_authentication"
      value = "on"
    }
    # [...]
  }
}

resource "google_compute_global_address" "pg-main-private" {
  provider = google-beta

  name          = "pg-main-private"
  purpose       = "VPC_PEERING"
  address_type  = "INTERNAL"
  prefix_length = 20
  network       = data.google_compute_network.default.id
}
resource "google_service_networking_connection" "pg-main-private-vpc" {
  provider = google-beta

  network = data.google_compute_network.default.id
  service = "servicenetworking.googleapis.com"
  reserved_peering_ranges = [
    google_compute_global_address.pg-main-private.name
  ]

  depends_on = [google_compute_global_address.pg-main-private]
}

resource "random_password" "pg-main-postgres" {
  length           = 32
  special          = true
  override_special = "_-"
}
resource "google_sql_user" "pg-main-postgres" {
  name     = "postgres"
  password = random_password.pg-main-postgres.result
  instance = google_sql_database_instance.pg-main.name
}

# terraform-provider-postgres now works:
resource "postgresql_database" "pg-main-db1" {
  provider = postgresql.pg-main

  name  = "db1"
  owner = "some-role-you-can-also-define-with-the-provider"
}

¹: It does not have to be listed as an "authorized network" on the Cloud SQL instance, but somehow magically makes things work. I'm at wit's end here, really interested in @wilhelmi's setup...

²: Edit: I realized today while going through some logs that during my frantic attempts to make this work I must've run $ gcloud auth application-default login, which made it fall back to application default credentials and automatically discovered them, so you actually still need to populate GOOGLE_APPLICATION_CREDENTIALS with the path to a key file (or use that command, both of which are kind of iffy to do in automation), sorry for the wrong info there. :/

carlpett commented 3 years ago

@itspngu Without addressing your other points, the proxy does support IAM auth, see this pull request. It does seem to require adding some parameters to the invocation, though.

enocom commented 2 years ago

For what it’s worth, the proxy doesn’t create a network path. So if you’re trying to run operations against an instance that is private IP only and your terraform runner isn’t in the same VPC, it’s never going to work.

EricStG commented 1 year ago

Ran into this today If I read this right, the provider is missing the dialer configuration that would allow connection to a private instance https://cloud.google.com/sql/docs/postgres/samples/cloud-sql-postgres-databasesql-connect-connector

enocom commented 1 year ago

The Connector there is a newer version than the one used here. The one used here will always try public and private IP in that order.