Open wilhelmi opened 3 years ago
EDIT: My issue ended up being due to network access control list (NACL) settings that were blocking the traffic between the subnet with my CodeBuild project and the subnet with RDS Aurora. My mistake!
I am seeing similar behavior when trying to connect to an RDS Aurora cluster endpoint. I get the "Still creating..." message until my project in AWS CodeBuild eventually times out.
Here is my provider configuration. Perhaps there is an issue with connecting via the GoCloud? I know this is new in version 1.11.0.
provider "postgresql" {
scheme = "awspostgres"
host = data.aws_rds_cluster.data_science_postgresql.endpoint
port = data.aws_rds_cluster.data_science_postgresql.port
database = data.aws_rds_cluster.data_science_postgresql.database_name
username = data.aws_rds_cluster.data_science_postgresql.master_username
password = jsondecode(data.aws_secretsmanager_secret_version.master_db_credentials.secret_string)["password"]
superuser = false
}
Like @wilhelmi, I am also connecting to a DB that only has a private DNS name (in a private subnet), and I am running Terraform from a container that is running in AWS CodeBuild in the same subnet.
I have combed through the TF_LOG=TRACE
output, but I don't see any messages about connection issues. My RDS cluster also does not register a connection on the DB side.
@wilhelmi Thanks for opening the issue.
It will be hard for us to debug your problem as it seems to be more network configuration problems. Did you get new informations about this problem?
But I'll try to execute your example as soon as I can to see if it works no my side.
I am also getting the exact same error. I have an existing database cluster that was created successfully from Terraform:
resource "aws_db_instance" "internal_pg" {
allocated_storage = 20
db_subnet_group_name = aws_db_subnet_group.internal_pg.name
vpc_security_group_ids = [aws_security_group.internal_pg.id]
storage_type = "gp2"
engine = "postgres"
engine_version = "12.5"
instance_class = "db.m6g.large"
name = "db_pg_us_west_2_staging"
identifier = "db-pg-us-west-2-staging"
final_snapshot_identifier = "deleted-db-pg-us-west-2-staging"
username = "db_admin"
password = "password"
parameter_group_name = "postgres12"
}
And then when I try to create an additional postgres db by appending to the same Terraform file:
provider "postgresql" {
host = aws_db_instance.internal_pg.address
port = aws_db_instance.internal_pg.port
username = aws_db_instance.internal_pg.username
password = aws_db_instance.internal_pg.password
expected_version = aws_db_instance.internal_pg.engine_version
sslmode = "require"
}
resource "postgresql_database" "internal_pg" {
name = "another_db"
}
And running terraform apply
:
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# postgresql_database.internal_pg will be created
+ resource "postgresql_database" "internal_pg" {
+ allow_connections = true
+ connection_limit = -1
+ encoding = (known after apply)
+ id = (known after apply)
+ is_template = (known after apply)
+ lc_collate = (known after apply)
+ lc_ctype = (known after apply)
+ name = "another_db"
+ owner = (known after apply)
+ tablespace_name = (known after apply)
+ template = (known after apply)
}
Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions in workspace "internal-stg"?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
postgresql_database.internal_pg: Creating...
postgresql_database.internal_pg: Still creating... [10s elapsed]
postgresql_database.internal_pg: Still creating... [20s elapsed]
postgresql_database.internal_pg: Still creating... [30s elapsed]
postgresql_database.internal_pg: Still creating... [40s elapsed]
postgresql_database.internal_pg: Still creating... [50s elapsed]
postgresql_database.internal_pg: Still creating... [1m0s elapsed]
postgresql_database.internal_pg: Still creating... [1m10s elapsed]
Error: Error creating database "another_db": dial tcp 10.232.57.183:5432: connect: operation timed out
on main.tf line 175, in resource "postgresql_database" "internal_pg":
175: resource "postgresql_database" "internal_pg" {
And my providers.tf
file looks like:
terraform {
backend "local" {
}
required_providers {
postgresql = {
source = "cyrilgdn/postgresql"
version = "1.11.1"
}
}
}
Hi @cfloody ,
From where are your running Terraform? 10.232.57.183
is a private IP, are you running it from within the VPC?
@cfloody, I ran into this issue last week and my set-up is also in a private subnet. For me, I had to adjust my NACL to allow traffic between the private subnet running Terraform and the private subnet where I launched my RDS cluster. You should also check your security group rules and route tables. Any of those three things could potentially be blocking the traffic, in which case you'll see the behavior you're seeing. I was able to troubleshoot this was a network issue by trying to connect with psql
, tweaking the network settings until that worked, then returning to the Terraform stuff.
For reference, here is the provider configuration that works for me. Note, I am using aurora postgres serverless. I am also reading the RDS connection information from a data provider because I launched my cluster from a different Terraform configuration.
provider "postgresql" {
scheme = var.pg_scheme # postgres provider 1.11+ specifies awspostgres for RDS, but the GO CDK doesn't verify the x509 cert correctly
host = data.aws_rds_cluster.data_science_postgresql.endpoint
port = data.aws_rds_cluster.data_science_postgresql.port
database = data.aws_rds_cluster.data_science_postgresql.database_name
username = jsondecode(data.aws_secretsmanager_secret_version.master_db_credentials.secret_string)["username"]
password = jsondecode(data.aws_secretsmanager_secret_version.master_db_credentials.secret_string)["password"]
superuser = false
sslmode = "require"
sslrootcert = "/code/manifest/AmazonRootCA1.pem"
expected_version = "10.12"
}
You can get the AWS Root Cert from the RDS documentation here, https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html. I found this worked better than letting the GO CDK do it, because that was returning x509 cert errors. This seems to be a GO-AWS thing, but folks are working on it.
I am running this from a local workstation outside of GCP, the target in this case. From my reading of the gocloud library it should be using the cloud-sql-proxy lib to connect to the DB. This should connect to the DB even though it only has a private IP. I can connect to the target DB from the same machine if I start cloud-sql-proxy from the CLI. So I think the network route works. One thing that jumped at me was the 3307 port when 5432 was specifically configured.
Also running into this. From what I can tell the port being not 5432 is ok (although I initially reacted to it as well): https://github.com/GoogleCloudPlatform/cloudsql-proxy/blob/ec20300fbae3d11e93a9cc62946463632f3d9773/proxy/proxy/client.go#L84-L86 Unlike you @wilhelmi, I can't connect with a local cloud-sql-proxy either, I don't have a route for the private IP that is used. Do you have a VPN that includes a route to the private network IP space?
I've been battling this all day as well. However, like @carlpett, I had no luck connecting with the cloud_sql_proxy as long as the instance does not have a public v4 address. As per
https://cloud.google.com/sql/docs/postgres/connect-admin-proxy#authentication-options
it also seems that "IAM user database authentication is not currently integrated with the Cloud SQL Auth proxy", which might be the root of my issues. Anyway, assigning a v4 (can be ephemeral) to the instance allows the provider to access the database¹ - in that case I also don't need to set the ² the Proxy GOOGLE_APPLICATION_CREDENTIALS
environment variable, which is great because(unlike gogloud/terraform-provider-postgresql) does not support passing the contents of a key file rather than a path:
https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/323
While this doesn't solve the issue itself, I was able to work around it with the following (abridged) setup:
# terraform-service-account is a GCP service account with a bunch of scopes, for Cloud SQL & Networking we need at least:
# - roles/cloudsql.admin
# - roles/compute.networkAdmin
$ TF_VAR_GOOGLE_CREDENTIALS="$(cat terraform-service-account-key.json)" terraform <command>
The relevant bits from my .tf
files:
provider "google-beta" {
credentials = var.GOOGLE_CREDENTIALS
# [...]
}
data "google_project" "default" {}
resource "google_compute_network" "default" {
provider = google-beta
name = "default"
# [...]
auto_create_subnetworks = true
}
resource "google_compute_subnetwork" "default" {
provider = google-beta
name = "default"
region = "europe-west3"
network = google_compute_network.default.id
ip_cidr_range = "10.156.0.0/20"
purpose = "PRIVATE"
private_ip_google_access = true
}
provider "postgresql" {
alias = "pg-main"
host = google_sql_database_instance.pg-main.connection_name
# I tried using a service account here, didn't work. Fell back to the "postgres" default user, which does work.
username = google_sql_user.pg-main-postgres.name
password = google_sql_user.pg-main-postgres.password
scheme = "gcppostgres"
superuser = false
expected_version = var.POSTGRES_VERSION
}
resource "google_sql_database_instance" "pg-main" {
provider = google-beta
name = "pg-main"
database_version = "POSTGRES_${var.POSTGRES_VERSION}"
# [...]
settings {
ip_configuration {
ipv4_enabled = true
require_ssl = true
private_network = data.google_compute_network.default.id
}
# :(
database_flags {
name = "cloudsql.iam_authentication"
value = "on"
}
# [...]
}
}
resource "google_compute_global_address" "pg-main-private" {
provider = google-beta
name = "pg-main-private"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 20
network = data.google_compute_network.default.id
}
resource "google_service_networking_connection" "pg-main-private-vpc" {
provider = google-beta
network = data.google_compute_network.default.id
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = [
google_compute_global_address.pg-main-private.name
]
depends_on = [google_compute_global_address.pg-main-private]
}
resource "random_password" "pg-main-postgres" {
length = 32
special = true
override_special = "_-"
}
resource "google_sql_user" "pg-main-postgres" {
name = "postgres"
password = random_password.pg-main-postgres.result
instance = google_sql_database_instance.pg-main.name
}
# terraform-provider-postgres now works:
resource "postgresql_database" "pg-main-db1" {
provider = postgresql.pg-main
name = "db1"
owner = "some-role-you-can-also-define-with-the-provider"
}
¹: It does not have to be listed as an "authorized network" on the Cloud SQL instance, but somehow magically makes things work. I'm at wit's end here, really interested in @wilhelmi's setup...
²: Edit: I realized today while going through some logs that during my frantic attempts to make this work I must've run $ gcloud auth application-default login
, which made it fall back to application default credentials and automatically discovered them, so you actually still need to populate GOOGLE_APPLICATION_CREDENTIALS
with the path to a key file (or use that command, both of which are kind of iffy to do in automation), sorry for the wrong info there. :/
@itspngu Without addressing your other points, the proxy does support IAM auth, see this pull request. It does seem to require adding some parameters to the invocation, though.
For what it’s worth, the proxy doesn’t create a network path. So if you’re trying to run operations against an instance that is private IP only and your terraform runner isn’t in the same VPC, it’s never going to work.
Ran into this today If I read this right, the provider is missing the dialer configuration that would allow connection to a private instance https://cloud.google.com/sql/docs/postgres/samples/cloud-sql-postgres-databasesql-connect-connector
The Connector there is a newer version than the one used here. The one used here will always try public and private IP in that order.
Hi there,
Thank you for opening an issue. Please provide the following information:
Terraform Version
0.13.6
Affected Resource(s)
Terraform Configuration Files
Debug Output
Expected Behavior
The new database is created, via cloud_sql_proxy tunnel provided by the go cloud lib.
Actual Behavior
postgresql_database.my_db: Still creating... [1m10s elapsed]
Error: Error creating database "my_db": dial tcp 10.35.0.3:3307: connect: operation timed out
Steps to Reproduce
Please list the steps required to reproduce the issue, for example:
terraform apply
Important Factoids
I can connect to the DB via local cloud_sql_proxy. DB does not have a public IP.