EngineerBetter / control-tower

Deploy and operate Concourse CI in a single command
https://www.engineerbetter.com
Apache License 2.0
121 stars 39 forks source link

cannot update deployment (google_sql_database.director error) #62

Open lnhrdt opened 4 years ago

lnhrdt commented 4 years ago

On the deploy docs page there's a note that

The control plane will be restricted to the IP control-tower deploy was run from.

I am working from a location that has a dynamic IP address I would like to update my deployment from a new IP address. How can I update the control plane's allowed IP address setting?

lnhrdt commented 4 years ago

On second thought, I still believe I'm having an issue because my IP address has changed but I'm not completely certain. Just in case I don't understand the cause of my issue, here is the error message I encounter when I run the deploy command:

google_compute_firewall.sql: Refreshing state... (ID: control-tower-my-project-sql)

Error: Error refreshing state: 1 error(s) occurred:

* google_sql_database.director: 1 error(s) occurred:

* google_sql_database.director: google_sql_database.director: Error reading SqlDatabase "bosh-XXX:udb": Get https://www.googleapis.com/sql/v1beta4/projects/my-project/instances/bosh-XXX/databases/udb?alt=json: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

crsimmons commented 4 years ago

The only thing that gets restricted by this particular whitelist is access to the BOSH director. Every time you run control tower deploy it will re-whitelist your IP before doing anything else.

The allow-ips flag is actually totally different in that it restricts access to Concourse itself (i.e. the web UI and credhub)

There currently isn't a way to whitelist a range of IPs for the BOSH whitelist but you you go in an manually change the rules on the control-tower-<your deployment>-director firewall rule after every deploy.

lnhrdt commented 4 years ago

@crsimmons thanks for the clarification it's helpful to know that the deploy command re-whitelists my IP. Seems like I didn't understand the issue I'm facing.

Can you help me figure out what the actual cause of this error is?

google_compute_firewall.sql: Refreshing state... (ID: control-tower-my-project-sql)

Error: Error refreshing state: 1 error(s) occurred:

* google_sql_database.director: 1 error(s) occurred:

* google_sql_database.director: google_sql_database.director: Error reading SqlDatabase "bosh-XXX:udb": Get https://www.googleapis.com/sql/v1beta4/projects/my-project/instances/bosh-XXX/databases/udb?alt=json: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

I'm unable to update my deployment.

DanielJonesEB commented 4 years ago

Hey @lnhrdt! It looks like there's some connectivity issue between the machine you're running Control Tower from and the Google Cloud API. That error message is being logged by Terraform (which Control Tower uses under the hood), and Get https://www.googleapis.com/sql/v1beta4/projects/my-project/instances/bosh-XXX/databases/udb?alt=json: net/http: request canceled (Client.Timeout exceeded while awaiting headers) says that it's timing out awaiting headers back from Google Cloud.

I'd try testing your connectivity to the Google Cloud APIs. It looks like you can establish an TCP connection, as otherwise you'd get a connection refused or i/o timeout error. I think you're establishing a connection, but Google isn't sending the HTTP response back down the wire before Terraform times out.

lnhrdt commented 4 years ago

Hey @DanielJonesEB thanks for the clarifications and suggestions.

In hopes of pinpointing my problem, I've used Control Tower to create, update, and destroy multiple installations. The conclusion I've come to is that everything works fine until I run the self-update job. When I run the job it succeeds but future deploy commands on the same installation fail regardless of if they change the configuration or simply reissue the original configuration.

I'd like to be able to rely on the self-update job. Any ideas?

Edit: I just encountered the same failure without running the self-update job. It seems to appear only after some time has passed and I haven't correlated it with any specific action. I've tried from the same IP, from entirely new internet connections. Once the error occurs it stays until I destroy the installation. Destroying and recreating my installation anytime I want to update the configuration obviously isn't ideal so I'll continue trying to pinpoint the issue. I'll update here if I find anything, but any other ideas would be very welcome.

DanielJonesEB commented 4 years ago

Wow, that's really odd! Only thing I can think of it to maybe try replicating what Terraform is trying to do via other clients - so maybe try getting/describing the database via gcloud, and see whether you get the same error?