SlideRuleEarth / sliderule-prov-sys

Provisioning System for Slide Rule clusters
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Failed to change min and max nodes #97

Open jpswinski opened 1 year ago

jpswinski commented 1 year ago

I tried to "Configure" the min and max nodes for the public sliderule cluster. The running configuration was 7-7-30, and I change the min to 40 and the max to 100. This put it into a state where the new configuration was 40-7-100. An update was automatically generated by ProvSys to change it to 40-40-100. When that happened, the deploy failed:

**************** cmd submitted: ['terraform', '-chdir=/ps_server/sliderule/terraform', 'apply', '-auto-approve', '-var', 'cluster_version=v3', '-var', 'domain=slideruleearth.io', '-var', 'is_public=True', '-var', 'cluster_name=sliderule', '-var', 'node_asg_min_capacity=40', '-var', 'node_asg_max_capacity=100', '-var', 'node_asg_desired_capacity=40'] at 2023-08-07 12:27:17 UTC
data.aws_ami.sliderule_cluster_ami: Reading...
data.aws_route53_zone.selected: Reading...
aws_s3_bucket_object.cron-job["cronjob.txt"]: Refreshing state... [id=infrastructure/software/sliderule-cronjob.txt]
aws_s3_bucket_object.docker-compose-config["docker-compose-ilb.yml"]: Refreshing state... [id=infrastructure/software/sliderule-docker-compose-ilb.yml]
aws_s3_bucket_object.export-log-script["export_logs.sh"]: Refreshing state... [id=infrastructure/software/sliderule-export_logs.sh]
aws_iam_role.s3-role: Refreshing state... [id=sliderule-iam-role]
aws_vpc.sliderule-vpc: Refreshing state... [id=vpc-09ca2859195464a0e]
aws_iam_policy.s3-policy: Refreshing state... [id=arn:aws:iam::742127912612:policy/sliderule-iams3-policy]
aws_s3_bucket_object.docker-compose-config["docker-compose-sliderule.yml"]: Refreshing state... [id=infrastructure/software/sliderule-docker-compose-sliderule.yml]
aws_s3_bucket_object.docker-compose-config["docker-compose-monitor.yml"]: Refreshing state... [id=infrastructure/software/sliderule-docker-compose-monitor.yml]
data.aws_secretsmanager_secret_version.secrets: Reading...
data.aws_ami.sliderule_cluster_ami: Read complete after 0s [id=ami-0098740cce22bf29d]
aws_iam_policy.ec2-policy: Refreshing state... [id=arn:aws:iam::742127912612:policy/sliderule-iamec2-policy]
data.aws_secretsmanager_secret_version.secrets: Read complete after 0s [id=slideruleearth.io/secrets|AWSCURRENT]
aws_iam_role_policy_attachment.s3-role-policy-local: Refreshing state... [id=sliderule-iam-role-20230726141827905800000001]
aws_iam_role_policy_attachment.ec2-role-policy-local: Refreshing state... [id=sliderule-iam-role-20230726141828021000000004]
aws_iam_role_policy_attachment.ec2-role-policy-aec2crro: Refreshing state... [id=sliderule-iam-role-20230726141828120900000006]
aws_iam_role_policy_attachment.ec2-role-policy-cwaap: Refreshing state... [id=sliderule-iam-role-20230726141828021700000005]
aws_iam_role_policy_attachment.ec2-role-policy-cwasp: Refreshing state... [id=sliderule-iam-role-20230726141827906200000002]
aws_iam_role_policy_attachment.ec2-role-policy-assmmic: Refreshing state... [id=sliderule-iam-role-20230726141827908600000003]
aws_iam_instance_profile.s3-role: Refreshing state... [id=sliderule-iam-profile]
aws_security_group.monitor-sg: Refreshing state... [id=sg-00ac8f5dbfa460be8]
aws_subnet.sliderule-subnet: Refreshing state... [id=subnet-01ef68d0c96b01bca]
aws_internet_gateway.sliderule-gateway: Refreshing state... [id=igw-0df6227bb09a18842]
aws_security_group.sliderule-sg: Refreshing state... [id=sg-07fbd3d2eeddcb97c]
aws_security_group.ilb-sg: Refreshing state... [id=sg-0d13068a5391bfd49]
aws_route_table.sliderule-route: Refreshing state... [id=rtb-0dc0a92d70b3da62d]
aws_route_table_association.sliderule-route-association: Refreshing state... [id=rtbassoc-0d00fe60b8f3570b9]
aws_instance.ilb: Refreshing state... [id=i-03329bd36c19bbbe2]
aws_instance.monitor: Refreshing state... [id=i-08eba42107b54c36b]
aws_launch_configuration.sliderule-instance: Refreshing state... [id=terraform-20230726141840086100000007]
aws_autoscaling_group.sliderule-cluster: Refreshing state... [id=terraform-20230726141849028500000008]
data.aws_route53_zone.selected: Read complete after 1s [id=Z0526045IQLILBFI9THF]
aws_route53_record.org: Refreshing state... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # aws_autoscaling_group.sliderule-cluster will be updated in-place
  ~ resource "aws_autoscaling_group" "sliderule-cluster" {
      ~ desired_capacity          = 7 -> 40
        id                        = "terraform-20230726141849028500000008"
      ~ launch_configuration      = "terraform-20230726141840086100000007" -> (known after apply)
      ~ max_size                  = 30 -> 100
      ~ min_size                  = 7 -> 40
        name                      = "terraform-20230726141849028500000008"
        # (21 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # aws_instance.ilb must be replaced
-/+ resource "aws_instance" "ilb" {
      ~ ami                                  = "ami-0bc69fc2ee174d0a9" -> "ami-0098740cce22bf29d" # forces replacement
      ~ arn                                  = "arn:aws:ec2:us-west-2:742127912612:instance/i-03329bd36c19bbbe2" -> (known after apply)
      ~ cpu_core_count                       = 2 -> (known after apply)
      ~ cpu_threads_per_core                 = 1 -> (known after apply)
      ~ disable_api_stop                     = false -> (known after apply)
      ~ disable_api_termination              = false -> (known after apply)
      - hibernation                          = false -> null
      ~ host_id                              = "" -> (known after apply)
      + host_resource_group_arn              = (known after apply)
      ~ id                                   = "i-03329bd36c19bbbe2" -> (known after apply)
      ~ instance_initiated_shutdown_behavior = "stop" -> (known after apply)
      ~ instance_state                       = "running" -> (known after apply)
      ~ ipv6_address_count                   = 0 -> (known after apply)
      ~ ipv6_addresses                       = [] -> (known after apply)
      ~ outpost_arn                          = "" -> (known after apply)
      ~ password_data                        = "" -> (known after apply)
      ~ placement_group                      = "" -> (known after apply)
      ~ placement_partition_number           = 0 -> (known after apply)
      ~ primary_network_interface_id         = "eni-07385c240fd14a226" -> (known after apply)
      ~ private_dns                          = "ip-10-0-1-5.us-west-2.compute.internal" -> (known after apply)
      ~ public_dns                           = "ec2-52-41-92-196.us-west-2.compute.amazonaws.com" -> (known after apply)
      ~ public_ip                            = "52.41.92.196" -> (known after apply)
      ~ secondary_private_ips                = [] -> (known after apply)
      ~ security_groups                      = [] -> (known after apply)
        tags                                 = {
            "Name" = "sliderule-ilb"
        }
      ~ tenancy                              = "default" -> (known after apply)
      + user_data_base64                     = (known after apply)
        # (15 unchanged attributes hidden)

      ~ capacity_reservation_specification {
          ~ capacity_reservation_preference = "open" -> (known after apply)

          + capacity_reservation_target {
              + capacity_reservation_id                 = (known after apply)
              + capacity_reservation_resource_group_arn = (known after apply)
            }
        }

      + ebs_block_device {
          + delete_on_termination = (known after apply)
          + device_name           = (known after apply)
          + encrypted             = (known after apply)
          + iops                  = (known after apply)
          + kms_key_id            = (known after apply)
          + snapshot_id           = (known after apply)
          + tags                  = (known after apply)
          + throughput            = (known after apply)
          + volume_id             = (known after apply)
          + volume_size           = (known after apply)
          + volume_type           = (known after apply)
        }

      ~ enclave_options {
          ~ enabled = false -> (known after apply)
        }

      + ephemeral_block_device {
          + device_name  = (known after apply)
          + no_device    = (known after apply)
          + virtual_name = (known after apply)
        }

      ~ maintenance_options {
          ~ auto_recovery = "default" -> (known after apply)
        }

      ~ metadata_options {
          ~ http_endpoint               = "enabled" -> (known after apply)
          ~ http_put_response_hop_limit = 1 -> (known after apply)
          ~ http_tokens                 = "optional" -> (known after apply)
          ~ instance_metadata_tags      = "disabled" -> (known after apply)
        }

      + network_interface {
          + delete_on_termination = (known after apply)
          + device_index          = (known after apply)
          + network_card_index    = (known after apply)
          + network_interface_id  = (known after apply)
        }

      ~ private_dns_name_options {
          ~ enable_resource_name_dns_a_record    = false -> (known after apply)
          ~ enable_resource_name_dns_aaaa_record = false -> (known after apply)
          ~ hostname_type                        = "ip-name" -> (known after apply)
        }

      ~ root_block_device {
          ~ device_name           = "/dev/sda1" -> (known after apply)
          ~ encrypted             = false -> (known after apply)
          ~ iops                  = 120 -> (known after apply)
          + kms_key_id            = (known after apply)
          - tags                  = {} -> null
          ~ throughput            = 0 -> (known after apply)
          ~ volume_id             = "vol-0eac524fb9834a7d8" -> (known after apply)
            # (3 unchanged attributes hidden)
        }
    }

  # aws_instance.monitor must be replaced
-/+ resource "aws_instance" "monitor" {
      ~ ami                                  = "ami-0bc69fc2ee174d0a9" -> "ami-0098740cce22bf29d" # forces replacement
      ~ arn                                  = "arn:aws:ec2:us-west-2:742127912612:instance/i-08eba42107b54c36b" -> (known after apply)
      ~ cpu_core_count                       = 2 -> (known after apply)
      ~ cpu_threads_per_core                 = 1 -> (known after apply)
      ~ disable_api_stop                     = false -> (known after apply)
      ~ disable_api_termination              = false -> (known after apply)
      - hibernation                          = false -> null
      ~ host_id                              = "" -> (known after apply)
      + host_resource_group_arn              = (known after apply)
      ~ id                                   = "i-08eba42107b54c36b" -> (known after apply)
      ~ instance_initiated_shutdown_behavior = "stop" -> (known after apply)
      ~ instance_state                       = "running" -> (known after apply)
      ~ ipv6_address_count                   = 0 -> (known after apply)
      ~ ipv6_addresses                       = [] -> (known after apply)
      ~ outpost_arn                          = "" -> (known after apply)
      ~ password_data                        = "" -> (known after apply)
      ~ placement_group                      = "" -> (known after apply)
      ~ placement_partition_number           = 0 -> (known after apply)
      ~ primary_network_interface_id         = "eni-04231e20eaf77763b" -> (known after apply)
      ~ private_dns                          = "ip-10-0-1-4.us-west-2.compute.internal" -> (known after apply)
      ~ public_dns                           = "ec2-34-217-85-62.us-west-2.compute.amazonaws.com" -> (known after apply)
      ~ public_ip                            = "34.217.85.62" -> (known after apply)
      ~ secondary_private_ips                = [] -> (known after apply)
      ~ security_groups                      = [] -> (known after apply)
        tags                                 = {
            "Name" = "sliderule-monitor"
        }
      ~ tenancy                              = "default" -> (known after apply)
      + user_data_base64                     = (known after apply)
        # (15 unchanged attributes hidden)

      ~ capacity_reservation_specification {
          ~ capacity_reservation_preference = "open" -> (known after apply)

          + capacity_reservation_target {
              + capacity_reservation_id                 = (known after apply)
              + capacity_reservation_resource_group_arn = (known after apply)
            }
        }

      + ebs_block_device {
          + delete_on_termination = (known after apply)
          + device_name           = (known after apply)
          + encrypted             = (known after apply)
          + iops                  = (known after apply)
          + kms_key_id            = (known after apply)
          + snapshot_id           = (known after apply)
          + tags                  = (known after apply)
          + throughput            = (known after apply)
          + volume_id             = (known after apply)
          + volume_size           = (known after apply)
          + volume_type           = (known after apply)
        }

      ~ enclave_options {
          ~ enabled = false -> (known after apply)
        }

      + ephemeral_block_device {
          + device_name  = (known after apply)
          + no_device    = (known after apply)
          + virtual_name = (known after apply)
        }

      ~ maintenance_options {
          ~ auto_recovery = "default" -> (known after apply)
        }

      ~ metadata_options {
          ~ http_endpoint               = "enabled" -> (known after apply)
          ~ http_put_response_hop_limit = 1 -> (known after apply)
          ~ http_tokens                 = "optional" -> (known after apply)
          ~ instance_metadata_tags      = "disabled" -> (known after apply)
        }

      + network_interface {
          + delete_on_termination = (known after apply)
          + device_index          = (known after apply)
          + network_card_index    = (known after apply)
          + network_interface_id  = (known after apply)
        }

      ~ private_dns_name_options {
          ~ enable_resource_name_dns_a_record    = false -> (known after apply)
          ~ enable_resource_name_dns_aaaa_record = false -> (known after apply)
          ~ hostname_type                        = "ip-name" -> (known after apply)
        }

      ~ root_block_device {
          ~ device_name           = "/dev/sda1" -> (known after apply)
          ~ encrypted             = false -> (known after apply)
          ~ iops                  = 120 -> (known after apply)
          + kms_key_id            = (known after apply)
          - tags                  = {} -> null
          ~ throughput            = 0 -> (known after apply)
          ~ volume_id             = "vol-0530bdd513527cc9c" -> (known after apply)
            # (3 unchanged attributes hidden)
        }
    }

  # aws_launch_configuration.sliderule-instance must be replaced
-/+ resource "aws_launch_configuration" "sliderule-instance" {
      ~ arn                              = "arn:aws:autoscaling:us-west-2:742127912612:launchConfiguration:c85e1f4e-27f8-4e49-bc00-5eadd8d2c78f:launchConfigurationName/terraform-20230726141840086100000007" -> (known after apply)
      ~ ebs_optimized                    = false -> (known after apply)
      ~ id                               = "terraform-20230726141840086100000007" -> (known after apply)
      ~ image_id                         = "ami-0bc69fc2ee174d0a9" -> "ami-0098740cce22bf29d" # forces replacement
      ~ name                             = "terraform-20230726141840086100000007" -> (known after apply)
      ~ name_prefix                      = "terraform-" -> (known after apply)
      ~ user_data                        = "a2fbce5475342f18d5a84eb88a9b7a233dbc4f34" -> "77c7c4dd37c864912d90616e7689dd593ca50599" # forces replacement
      - vpc_classic_link_security_groups = [] -> null
        # (6 unchanged attributes hidden)

      + ebs_block_device {
          + delete_on_termination = (known after apply)
          + device_name           = (known after apply)
          + encrypted             = (known after apply)
          + iops                  = (known after apply)
          + no_device             = (known after apply)
          + snapshot_id           = (known after apply)
          + throughput            = (known after apply)
          + volume_size           = (known after apply)
          + volume_type           = (known after apply)
        }

      + metadata_options {
          + http_endpoint               = (known after apply)
          + http_put_response_hop_limit = (known after apply)
          + http_tokens                 = (known after apply)
        }

      + root_block_device {
          + delete_on_termination = (known after apply)
          + encrypted             = (known after apply)
          + iops                  = (known after apply)
          + throughput            = (known after apply)
          + volume_size           = (known after apply)
          + volume_type           = (known after apply)
        }
    }

  # aws_route53_record.org will be updated in-place
  ~ resource "aws_route53_record" "org" {
        id                               = "Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A"
        name                             = "sliderule.slideruleearth.io"
      ~ records                          = [
          - "52.41.92.196",
        ] -> (known after apply)
        # (6 unchanged attributes hidden)
    }

Plan: 3 to add, 2 to change, 3 to destroy.

Changes to Outputs:
  ~ ilb_id         = "i-03329bd36c19bbbe2" -> (known after apply)
  ~ ilb_ip_address = "52.41.92.196" -> (known after apply)
  ~ ilb_state      = "running" -> (known after apply)
  ~ monitor_id     = "i-08eba42107b54c36b" -> (known after apply)
  ~ monitor_state  = "running" -> (known after apply)
aws_launch_configuration.sliderule-instance: Destroying... [id=terraform-20230726141840086100000007]
aws_instance.ilb: Destroying... [id=i-03329bd36c19bbbe2]
aws_instance.monitor: Destroying... [id=i-08eba42107b54c36b]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 10s elapsed]
aws_instance.ilb: Still destroying... [id=i-03329bd36c19bbbe2, 10s elapsed]
aws_instance.monitor: Still destroying... [id=i-08eba42107b54c36b, 10s elapsed]
aws_instance.ilb: Still destroying... [id=i-03329bd36c19bbbe2, 20s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 20s elapsed]
aws_instance.monitor: Still destroying... [id=i-08eba42107b54c36b, 20s elapsed]
aws_instance.monitor: Destruction complete after 30s
aws_instance.monitor: Creating...
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 30s elapsed]
aws_instance.ilb: Still destroying... [id=i-03329bd36c19bbbe2, 30s elapsed]
aws_instance.monitor: Still creating... [10s elapsed]
aws_instance.ilb: Still destroying... [id=i-03329bd36c19bbbe2, 40s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 40s elapsed]
aws_instance.monitor: Creation complete after 12s [id=i-0f4108fe8e5b4c7be]
aws_instance.ilb: Destruction complete after 50s
aws_instance.ilb: Creating...
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 50s elapsed]
aws_instance.ilb: Still creating... [10s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m0s elapsed]
aws_instance.ilb: Creation complete after 12s [id=i-017c389cfb7b47955]
aws_route53_record.org: Modifying... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m10s elapsed]
aws_route53_record.org: Still modifying... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A, 10s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m20s elapsed]
aws_route53_record.org: Still modifying... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A, 20s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m30s elapsed]
aws_route53_record.org: Still modifying... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A, 30s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m40s elapsed]
aws_route53_record.org: Still modifying... [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A, 40s elapsed]
aws_route53_record.org: Modifications complete after 46s [id=Z0526045IQLILBFI9THF_sliderule.slideruleearth.io_A]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 1m50s elapsed]
aws_launch_configuration.sliderule-instance: Still destroying... [id=terraform-20230726141840086100000007, 2m0s elapsed]
╷
│ Warning: Argument is deprecated
│ 
│   with aws_s3_bucket_object.docker-compose-config,
│   on config-files.tf line 4, in resource "aws_s3_bucket_object" "docker-compose-config":
│    4:   bucket = "sliderule"
│ 
│ Use the aws_s3_object resource instead
│ 
│ (and 13 more similar warnings elsewhere)
╵
╷
│ Error: deleting Auto Scaling Launch Configuration (terraform-20230726141840086100000007): ResourceInUse: Cannot delete launch configuration terraform-20230726141840086100000007 because it is attached to AutoScalingGroup terraform-20230726141849028500000008
│   status code: 400, request id: bb80075b-e180-4ce3-ae17-9195300e721d
│ 
│ 
╵

sliderule cmd-11: Update iter:<4> caught ProvisionCmdError exception: ProvisionCmdError('ps-server returned this error: sliderule cmd-11: Update iter:<4> FAILED with error: Processing Update sliderule cluster caught this exception: PS_InternalError("FAILED! Command \'[\'terraform\', \'-chdir=/ps_server/sliderule/terraform\', \'apply\', \'-auto-approve\', \'-var\', \'cluster_version=v3\', \'-var\', \'domain=slideruleearth.io\', \'-var\', \'is_public=True\', \'-var\', \'cluster_name=sliderule\', \'-var\', \'node_asg_min_capacity=40\', \'-var\', \'node_asg_max_capacity=100\', \'-var\', \'node_asg_desired_capacity=40\']\' returned non-zero exit status 1. for Update sliderule")')
cugarteblair commented 1 year ago

It looks like terraform did not know how to handle the change in the AMI? Also, "launch configurations" have been deprecated:https://docs.aws.amazon.com/autoscaling/ec2/userguide/launch-configurations.html there is a migration away from "launch configurations" to "Launch templates"