Recreate RDS instances 90% of the time

screwyy commented 4 years ago

Terraform Version

Terraform v0.12.20

provider.flexibleengine v1.12.0
provider.mysql v1.9.0

Affected Resource(s)

Please list the resources as a list, for example:

flexibleengine_rds_instance_v3

Terraform Configuration Files

variable "spark_clients_rds_mysql_version" {
  description = "MySQL version"
  default     = "5.7"
}

variable "spark_clients_rds_mysql_root_password" {
  description = "MySQL root password"
  default     = "Micros123!"
}

variable "spark_clients_rds_mysql_backup_strategy" {
  type = map(string)
  default = {
    start_time = "05:00-06:00"
    keep_days  = 2
  }
}

variable "spark_clients_rds_mysql_port" {
  description = "MySQL listening port"
  default     = "3306"
}

variable "spark_clients_rds_mysql_rules" {
  type        = map(string)
  description = "Permit traffic on tcp 3306 port"
  default = {
    "0" = "ingress,IPv4,tcp,3306,3306,0.0.0.0/0"
    "1" = "ingress,IPv4,icmp,0,0,0.0.0.0/0"
  }
}

/* Security groups */

module "spark_clients_rds_mysql_sg" {
  source = "git::ssh://git@sourcehub.orangeapplicationsforbusiness.com/MT0117217/malima-flexible-engine/terraform/modules/networking.sg.git?ref=v1.0.0"
  role             = "skcmysql"
  network_sg_rules = var.spark_clients_rds_mysql_rules
  env              = var.env
  project          = var.project
  type             = "pri"
}

/* RDS Instance */

resource "flexibleengine_rds_instance_v3" "spark_clients_rds_mysql" {
  availability_zone = [var.az, var.az_backup]
  db {
    password = var.spark_clients_rds_mysql_root_password
    port     = var.spark_clients_rds_mysql_port
    type     = "MySQL"
    version  = var.spark_clients_rds_mysql_version
  }
  flavor = "rds.mysql.s1.large.ha"
  name = "${var.env}${var.project}${var.sprovider}${var.vpc_config_tru["type"]}skcmy01"
  security_group_id = module.spark_clients_rds_mysql_sg.network_sg_id
  subnet_id = module.vpc_tru_subnet_pri.network_id
  volume {
    type = "COMMON"
    size = 40
  }
  vpc_id = var.vpc_tru
  backup_strategy {
    start_time = var.spark_clients_rds_mysql_backup_strategy["start_time"]
    keep_days  = var.spark_clients_rds_mysql_backup_strategy["keep_days"]
  }
  ha_replication_mode = "async"
  param_group_id = flexibleengine_rds_parametergroup_v3.spark_clients_rds_mysql_parameters.id
  depends_on = [
    module.spark_clients_rds_mysql_sg,
    flexibleengine_rds_parametergroup_v3.spark_clients_rds_mysql_parameters,
  ]
}

resource "flexibleengine_rds_parametergroup_v3" "spark_clients_rds_mysql_parameters" {
  name = "spark_clients_rds_mysql_pg"
  description = "spark_clients_rds_mysql_parameters"
  values = {
    max_connections              = "800"
    autocommit                   = "ON"
    innodb_buffer_pool_instances = "2"
    innodb_buffer_pool_size      = "16000000000"
    table_open_cache             = "16000"
  }
  datastore {
    type    = "mysql"
    version = var.spark_clients_rds_mysql_version
  }
}

Debug Output

In the Gist you will find four files from two runs of terraform plan one run that is good (_good.txt) and one is bad (_bad.txt) (each with the terraform usual output and the other with the DEBUG). There are 2 runs because I want to show that in one run terraform says No changes. Infrastructure is up-to-date. (_good.txt) and a couple of minutes later (without anything being modified) it want's to recreate an RDS instance (_bad.txt).

Expected Behavior

It should have said No changes. Infrastructure is up-to-date. each run of terraform plan.

Actual Behavior

It want's to recreate the RDS instance. It's not always the same instance (we have ~10 RDS instances), it's random and driving us crazy. Even if we run terraform apply and recreate the RDS instance and parameter group, on the next terraform plan there is a very high probability that it will want to recreate the same RDS instance or another. We have done this a couple of times and this is where we are at.

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

terraform plan a couple of times.

ShiChangkuo commented 4 years ago

@screwyy from the test_output_bad.txt, I found that the ID of resource flexibleengine_rds_parametergroup_v3.spark_clients_rds_mysql_parameters is nil when refreshing, so the provider will create a new resource, then the flexibleengine_rds_instance_v3.spark_clients_rds_mysql must be force replaced.

According to the source code, we got an 404 error when get the rds parameter group.

Could you check whether the rds parameter group is exist on the console before running terraform plan?

ShiChangkuo commented 4 years ago

as there are many resources, you can refresh and show the specified resource by the following commands:

terraform refresh -target=xxx
terraform state show xxx

screwyy commented 4 years ago

@ShiChangkuo These parameter groups have been created with terraform and as you can see they exist (and terraform plan want's to recreate each of them at almost every run):

parameter_group Here you can see that the parameter group was applied successfully in December 2019, proof that it exists since a long time.

apply_parameter

here are the outputs of terraform state show

ShiChangkuo commented 4 years ago

@screwyy If the rds parametergroup was exist since December 2019 , it seems like something wrong with the API side. You can log the API request body with setting OS_DEBUG=1 in your env, then run terraform refresh -target=flexibleengine_rds_parametergroup_v3.spark_clients_rds_mysql_parameters, thanks

screwyy commented 4 years ago

here is the gist

ShiChangkuo commented 4 years ago

sorry, I can't find any exception from the log, but I'm sure that the ID of flexibleengine_rds_parametergroup_v3.spark_clients_rds_mysql_parameters was set to empty during terraform plan. Maybe the issue can not be stable to reproduce.

Could you reproduce it with OS_DEBUG=1 by your steps, and attach the log file?

screwyy commented 4 years ago

hello, here is the gist: The good is when terraform says No changes. Infrastructure is up-to-date. for RDS instance: customer_alarms_rds_mysql and the parameter group associated to this instance customer_alarms_rds_mysql_parameters The bad is when I ran terraform plan with only 4 RDS instances + their parameters and terraform wants to recreate customer_alarms_rds_mysql + customer_alarms_rds_mysql_parameters as if the parameter group is not created even thou a couple of minutes before (good part in the gist) it said No changes. Infrastructure is up-to-date. for customer_alarms_rds_mysql.

I couldn't replicate the issue with only one RDS instance + parameters group. It seems that the issue happens when there are multiple RDS instances + parameter groups. Also when targeting only parameters group it always wants to create them as if the parameter groups don't exist.

screwyy commented 4 years ago

@ShiChangkuo do you have any updates on this ? Did you have time to check the gist ?

ShiChangkuo commented 3 years ago

@screwyy sorry for the late response.

It seems that huaweicloud/golangsdk#515 can solve the issue. Can you have time to test it again?

ShiChangkuo commented 3 years ago

@screwyy I'm going to close this issue because there are no updates for a long time.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

FlexibleEngineCloud / terraform-provider-flexibleengine