ChainSafe / forest-iac

Infrastructure as Code to support the Forest Filecoin project
Apache License 2.0
10 stars 4 forks source link

Timeout on remote exec #424

Open LesnyRumcajs opened 3 months ago

LesnyRumcajs commented 3 months ago

Issue summary

While deploying node or snapshot services, Terraform seems to have had an issue with SSH connectivity. For example, this succeeded on 3rd attempt; previous ones reported a timeout.

digitalocean_droplet.forest (remote-exec): Connecting to remote host via SSH...
digitalocean_droplet.forest (remote-exec):   Host: 209.38.234.101
digitalocean_droplet.forest (remote-exec):   User: root
digitalocean_droplet.forest (remote-exec):   Password: false
digitalocean_droplet.forest (remote-exec):   Private key: false
digitalocean_droplet.forest (remote-exec):   Certificate: false
digitalocean_droplet.forest (remote-exec):   SSH Agent: true
digitalocean_droplet.forest (remote-exec):   Checking Host Key: false
digitalocean_droplet.forest (remote-exec):   Target Platform: unix
digitalocean_droplet.forest: Still creating... [5m0s elapsed]
digitalocean_droplet.forest: Still creating... [5m10s elapsed]
digitalocean_droplet.forest: Still creating... [5m20s elapsed]
digitalocean_droplet.forest (remote-exec): Connecting to remote host via SSH...
digitalocean_droplet.forest (remote-exec):   Host: 209.38.234.101
digitalocean_droplet.forest (remote-exec):   User: root
digitalocean_droplet.forest (remote-exec):   Password: false
digitalocean_droplet.forest (remote-exec):   Private key: false
digitalocean_droplet.forest (remote-exec):   Certificate: false
digitalocean_droplet.forest (remote-exec):   SSH Agent: true
digitalocean_droplet.forest (remote-exec):   Checking Host Key: false
digitalocean_droplet.forest (remote-exec):   Target Platform: unix
digitalocean_droplet.forest: Still creating... [5m30s elapsed]
digitalocean_droplet.forest: Still creating... [5m40s elapsed]
digitalocean_droplet.forest: Still creating... [5m50s elapsed]
digitalocean_droplet.forest: Still creating... [6m0s elapsed]
digitalocean_droplet.forest (remote-exec): Connecting to remote host via SSH...
digitalocean_droplet.forest (remote-exec):   Host: 209.38.234.101
digitalocean_droplet.forest (remote-exec):   User: root
digitalocean_droplet.forest (remote-exec):   Password: false
digitalocean_droplet.forest (remote-exec):   Private key: false
digitalocean_droplet.forest (remote-exec):   Certificate: false
digitalocean_droplet.forest (remote-exec):   SSH Agent: true
digitalocean_droplet.forest (remote-exec):   Checking Host Key: false
digitalocean_droplet.forest (remote-exec):   Target Platform: unix
digitalocean_droplet.forest: Still creating... [6m10s elapsed]
╷
│ Error: remote-exec provisioner error
│ 
│   with digitalocean_droplet.forest,
│   on main.tf line 50, in resource "digitalocean_droplet" "forest":
│   50:   provisioner "remote-exec" {
│ 
│ timeout - last error: dial tcp 209.38.234.101:22: i/o timeout
╵
time=2024-03-12T08:05:19Z level=error msg=terraform invocation failed in /home/runner/work/forest-iac/forest-iac/tf-managed/live/environments/prod/applications/forest-butterflynet/.terragrunt-cache/NHFD3q0GdGpJF-apYUdKkrp8WkU/bKi-1jljNp0vP3Ch1WPabqRMasU prefix=[/home/runner/work/forest-iac/forest-iac/tf-managed/live/environments/prod/applications/forest-butterflynet] 
time=2024-03-12T08:05:19Z level=error msg=1 error occurred:
    * [/home/runner/work/forest-iac/forest-iac/tf-managed/live/environments/prod/applications/forest-butterflynet/.terragrunt-cache/NHFD3q0GdGpJF-apYUdKkrp8WkU/bKi-1jljNp0vP3Ch1WPabqRMasU] exit status 1

It also happened a week earlier in the snapshot service deployment.

There are a few possible culprits:

This may create zombie instances where the initialization script was not run.

It'd be great to resolve the root issue, but automatically retrying a few times is also acceptable as a workaround.

Other information and links

samuelarogbonlo commented 2 months ago

we can definitely increase timeout.