GCP Key rotation recreates NetApp connector and CVO

bryanheo commented 1 year ago

Hello,

This issue is related to https://github.com/NetApp/terraform-provider-netapp-cloudmanager/issues/153 The previous issue was closed due to the fact that the slow response (PTO). We are using version 22.2.0 and @chuyich has suggested to use 23.5.1 for the key rotation but If we upgrade the version, the connector and CVOs are recreated (This issue happens at 22.9.0 or higher version). I am not sure what has been changed since 22.9.0 but it recreates NetApp resources so I cannot use upper version for the key rotation. Could you investigate it?

Please do NOT close this case due to the fact that NetApp support engineers are refer to this issue.

Before upgrade the version:

% terraform plan               
module.use4.module.connector.netapp-cloudmanager_connector_gcp.this[0]: Refreshing state... [id=xxxxx-connector]
module.use4.module.systems["primary"].netapp-cloudmanager_cvo_gcp.this: Refreshing state... [id=xxxxx]

No changes. Your infrastructure matches the configuration.

After upgrade the version 22.9.0 or higher:

% terraform init -upgrade
Upgrading modules...
- use4 in ../../../tf-module-gcp-netapp
- use4.connector in ../../../tf-module-gcp-netapp/modules/connector
- use4.systems in ../../../tf-module-gcp-netapp/modules/system

Initializing the backend...

Initializing provider plugins...
- Finding netapp/netapp-cloudmanager versions matching "~> 22.9.0"...
- Finding hashicorp/local versions matching "~> 2.1"...
- Finding hashicorp/google versions matching "~> 3.74"...
- Installing netapp/netapp-cloudmanager v22.9.1...
- Installed netapp/netapp-cloudmanager v22.9.1 (signed by a HashiCorp partner, key ID F2524B23A2222E31)
- Using previously-installed hashicorp/local v2.4.0
- Using previously-installed hashicorp/google v3.90.1

% terraform plan
...
  # module.use4.module.connector.netapp-cloudmanager_connector_gcp.this[0] must be replaced
-/+ resource "netapp-cloudmanager_connector_gcp" "this" {
      ~ client_id             = "xxxxx" -> (known after apply)
      ~ id                    = "xxxxx-connector" -> (known after apply)
      ~ machine_type          = "n1-standard-4" -> "n2-standard-4" # forces replacement
        name                  = "xxxxx-connector"
        tags                  = [
            "http-server",
            "https-server",
            "netapp-ap-engg-np-netapp-connector",
        ]
        # (9 unchanged attributes hidden)
    }

  # module.use4.module.systems["primary"].netapp-cloudmanager_cvo_gcp.this must be replaced
-/+ resource "netapp-cloudmanager_cvo_gcp" "this" {
      ~ client_id                          = "xxxxx" -> (known after apply) # forces replacement
      ~ id                                 = "xxxxx" -> (known after apply)
        name                               = "netappgmtduse4pri"
      ~ svm_name                           = "svm_netappgmtduse4pri" -> (known after apply)
      ~ vpc0_node_and_data_connectivity    = "netapp" -> (known after apply) # forces replacement
      ~ vpc1_cluster_connectivity          = "netapp-np-cluster" -> (known after apply) # forces replacement
      ~ vpc2_ha_connectivity               = "netapp-np-ha" -> (known after apply) # forces replacement
      ~ vpc3_data_replication              = "netapp-np-repl" -> (known after apply) # forces replacement
      ~ vpc_id                             = "netapp" -> (known after apply) # forces replacement
      - writing_speed_state                = "NORMAL" -> null
        # (31 unchanged attributes hidden)
    }

chuyich commented 1 year ago

@bryanheo Based on the output you shared, there is one thing you can try. Please check if the machines_type is set in resource "netapp-cloudmanager_connector_gcp" on your resource file. I believe this setup was not in your resource file. So the new version was trying to use the NEW default value of machine_type and found the old default value of machine_type on your state file different. Then try to recreate it. You may add this line in your resource file and give it a try:

resource "netapp-cloudmanager_connector_gcp" "this" {
 :
  machine_type          = "n1-standard-4"
 :
}

bryanheo commented 1 year ago

@chuyich the connector redeployment issue can be resolved by defining machine_type but CVO is still recreated by the new version 23.5.1. Could you let me know the solution?

  # module.use4.module.systems["primary"].netapp-cloudmanager_cvo_gcp.this must be replaced
-/+ resource "netapp-cloudmanager_cvo_gcp" "this" {
      - capacity_tier                      = "cloudStorage" -> null # forces replacement
      ~ id                                 = "vsaworkingenvironment-xxx" -> (known after apply)
        name                               = "netappgmtduse4pri"
      + retries                            = 60 # forces replacement
      - svm_name                           = "svm_netappgmtduse4pri" -> null
      - tier_level                         = "standard" -> null
      - writing_speed_state                = "NORMAL" -> null
        # (35 unchanged attributes hidden)
    }

chuyich commented 1 year ago

@bryanheo There is a new release 23.8.0 which contains a bug fix that will help your case. Please use this version with the following update in your resource file:

resource "netapp-cloudmanager_cvo_gcp" "this" {
    name = "netappgmtduse4pri"
: 
    capacity_tier = "cloudStorage"
    svm_name = "svm_netappgmtduse4pri"
    tier_level = "standard"
    writing_speed_state = "NORMAL"
    retries = 60 
:
 }

chuyich commented 1 year ago

@bryanheo The new release 23.8.1 has the fix. You should not see any parameters and those gcp_xxx with force replacement when you do "terraform apply". It will be like this: ~ resource "netapp-cloudmanager_connector_gcp" "testgcpconnector" {

gcp_block_project_ssh_keys = false
gcp_enable_os_login = true
gcp_enable_os_login_sk = true
gcp_serial_port_enable = true id = "xxxxx" name = "xxxxx"
(11 unchanged attributes hidden)

} The new version will set those values on your existing connector. The connector won't be recreated.

bryanheo commented 1 year ago

@chuyich Thank you for the update I have tested the new version and the connector resource is not replaced anymore but the cvo resource is still replaced as shown below Could you check it again?

% terraform init --upgrade
Upgrading modules...
- use4 in ../../../tf-module-gcp-netapp
- use4.connector in ../../../tf-module-gcp-netapp/modules/connector
- use4.systems in ../../../tf-module-gcp-netapp/modules/system

Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/google versions matching "~> 3.74"...
- Finding netapp/netapp-cloudmanager versions matching "~> 23.8.1"...
- Finding hashicorp/local versions matching "~> 2.1"...
- Using previously-installed hashicorp/google v3.90.1
- Installing netapp/netapp-cloudmanager v23.8.1...
- Installed netapp/netapp-cloudmanager v23.8.1 (signed by a HashiCorp partner, key ID xxxxx)
- Using previously-installed hashicorp/local v2.4.0

Partner and community providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://www.terraform.io/docs/cli/plugins/signing.html

Terraform has made some changes to the provider dependency selections recorded
in the .terraform.lock.hcl file. Review those changes and commit them to your
version control system if they represent changes you intended to make.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

% terraform plan
...
Terraform will perform the following actions:

  # module.use4.module.connector.netapp-cloudmanager_connector_gcp.this[0] will be updated in-place
  ~ resource "netapp-cloudmanager_connector_gcp" "this" {
      + gcp_block_project_ssh_keys = false
      + gcp_enable_os_login        = true
      + gcp_enable_os_login_sk     = true
      + gcp_serial_port_enable     = true
        id                         = "netapp-ap-engg-np-netapp-connector"
        name                       = "netapp-ap-engg-np-netapp-connector"
        tags                       = [
            "http-server",
            "https-server",
            "netapp-ap-engg-np-netapp-connector",
        ]
        # (11 unchanged attributes hidden)
    }

  # module.use4.module.systems["primary"].netapp-cloudmanager_cvo_gcp.this must be replaced
-/+ resource "netapp-cloudmanager_cvo_gcp" "this" {
      ~ id                                 = "vsaworkingenvironment-xxxx" -> (known after apply)
        name                               = "netappgmtduse4pri"
      + retries                            = 60
      ~ vpc0_node_and_data_connectivity    = "netapp" -> (known after apply) # forces replacement
      ~ vpc1_cluster_connectivity          = "netapp-np-cluster" -> (known after apply) # forces replacement
      ~ vpc2_ha_connectivity               = "netapp-np-ha" -> (known after apply) # forces replacement
      ~ vpc3_data_replication              = "netapp-np-repl" -> (known after apply) # forces replacement
      ~ vpc_id                             = "netapp" -> (known after apply) # forces replacement
        # (34 unchanged attributes hidden)
    }
...

chuyich commented 1 year ago

@bryanheo Would you please provider your resource file and state file, so we can take a look? I tried to reproduce it on my side and didn't see this on CVO part.

bryanheo commented 1 year ago

@chuyich the file has been uploaded to Sharepoint and Nelson Viegas has shared the file with you. Could you check it?

wenjun666 commented 1 year ago

@bryanheo, @chuyich is out of the office for a week. The issue is possibly causes by "depends on". According to this article, https://itnext.io/beware-of-depends-on-for-modules-it-might-bite-you-da4741caac70

The depends_on meta-argument instructs Terraform to complete all actions on the dependency object (including Read actions) before performing actions on the object declaring the dependency. When the dependency object is an entire module, depends_on affects the order in which Terraform processes all of the resources and data sources associated with that module. Refer to Resource Dependencies and Data Resource Dependencies for more details.

In your case, the cvo_gcp depends on the connector, and when there is an update on the connector, terraform plan will not evaluate the data source inside the cvo_gcp module, specifically "data.google_compute_subnetwork.primary.network". "network" is exported from the data source and since data source won't be evaluated, it will be known after apply. We confirmed with the backend team, these values must be set to force new replacement because they are not allowed to modify. The article has provided a few alternations, I suggest you go through them all, and see which one fit your need best.

bryanheo commented 1 year ago

@wenjun666 Thank you for your update. One quick question, as mentioned above, We are using version 22.2.0 and the connector and CVOs are not recreated until 22.8.0 version. It only happens at 22.9.0 or higher version. Could you explain why the resources are not recreated until 22.8.0 if the issue is caused by "depends on"?

wenjun666 commented 1 year ago

@bryanheo IN 22.9.0, we changed the "machine_type" default value from "n1-standard-4" to "n2-standard-4". This triggers an update on the connector, and hence an update on the cvo. The "depends on" dependency issue is always there, but it can be seen in just some cases. If you are using the older version, then no update on connector is triggered, everything should be fine. But once there is an update on the connector, the problem is revealed.

bryanheo commented 1 year ago

@wenjun666 @chuyich I have removed the depends_on from the terraform resources (netapp-cloudmanager_connector_gcp and netapp-cloudmanager_cvo_gcp) but it still show the same result.

chuyich commented 1 year ago

@bryanheo Can you share the 'terraform plan' output and also the resource files like you did earlier? We would like to see what the changes have been made. Also based on earlier setup, found these were impacted: ~ vpc0_node_and_data_connectivity = "netapp" -> (known after apply) # forces replacement ~ vpc1_cluster_connectivity = "netapp-np-cluster" -> (known after apply) # forces replacement ~ vpc2_ha_connectivity = "netapp-np-ha" -> (known after apply) # forces replacement ~ vpc3_data_replication = "netapp-np-repl" -> (known after apply) # forces replacement ~ vpc_id = "netapp" -> (known after apply) # forces replacement And also found only the parameters referring local.netowrks having the issues in the tf files you shared a while ago. It might give you some ideas and see anything special between the non-affected parameters in the resource.

bryanheo commented 1 year ago

@chuyich Thank you for your message. As requested, I have uploaded the files via Nelson's sharepoint. Could you have a look?

chuyich commented 1 year ago

@bryanheo Sure! Would you please share the output of terraform plan? Just want to double check it. Thanks.

bryanheo commented 1 year ago

@chuyich I have uploaded the new file terraform-netapp-before-after-upgrade.zip in the sharepoint and the file includes before-upgrade and after-upgrade directories. There are terraform plan output files (terraform-plan.txt) in each terraform-netapp-np directory. As mentioned earlier, we did not have any issues until 22.8.x but since 22.9.0 or higher version, the resources are recreated Could you have a look?

wenjun666 commented 1 year ago

@bryanheo I think the commented out depends_on lines are not causing the issue. This is implicit depends on issue. Here is a work around that should work in your case. Can you try editing the following lines:

vpc0_node_and_data_connectivity = local.networks.primary vpc1_cluster_connectivity = local.networks.cluster vpc2_ha_connectivity = local.networks.ha vpc3_data_replication = local.networks.replication vpc_id = local.networks.primary

For "local.networks.primary", Instead of getting network from data source, use local variables to store the network for this terraform plan. Let me know how the terraform plan goes. Thank you.

bryanheo commented 10 months ago

@wenjun666 @chuyich as you mentioned, I have used the following two ways on v22.8.x rather than data source and the both ways are working fine. I have applied the changes into dev environment and monitoring it for the key rotation. I will let you know the update.

Used static variable

vpc_id                             = "netapp"
vpc0_node_and_data_connectivity    = "netapp"
vpc1_cluster_connectivity          = "netapp-np-cluster"
vpc2_ha_connectivity               = "netapp-np-ha"
vpc3_data_replication              = "netapp-np-repl"

Used ignore_changes

lifecycle {
ignore_changes = [
  vpc0_node_and_data_connectivity,
  vpc1_cluster_connectivity,
  vpc2_ha_connectivity,
  vpc3_data_replication,
  vpc_id
]
}

chuyich commented 7 months ago

Close this issue since it's not updated 2 months. Please open another issue if need.

NetApp / terraform-provider-netapp-cloudmanager

GCP Key rotation recreates NetApp connector and CVO #174

(11 unchanged attributes hidden)