IBM-Cloud / terraform-provider-ibm

https://registry.terraform.io/providers/IBM-Cloud/ibm/latest/docs
Mozilla Public License 2.0
341 stars 670 forks source link

Job that modifies params of Power VSI hangs - runs for 60 minutes and timeouts. #4309

Open gasgithub opened 1 year ago

gasgithub commented 1 year ago

Community Note

Terraform CLI and Terraform IBM Provider Version

Affected Resource(s)

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

Repo with terrafrom files - https://github.com/gasgithub/terraform-plans/tree/main/ibm-power

Steps to reproduce the issue

1.Create Schematic workspace using github repo 2.Run terraform to create instance 3.Change props in schematic workspace modifying CPU amount. 4.Generate/apply new plan ===> this hangs till timeout failing the Apply job

Debug Output

 2023/01/26 14:55:00 Terraform apply | ibm_pi_instance.instance: Still modifying... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a, 59m30s elapsed]
 2023/01/26 14:55:10 Terraform apply | ibm_pi_instance.instance: Still modifying... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a, 59m40s elapsed]
2023/01/26 14:55:20 Terraform apply | ibm_pi_instance.instance: Still modifying... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a, 59m50s elapsed]
 2023/01/26 14:55:29 Terraform apply | 
 2023/01/26 14:55:29 Terraform apply | Error: context deadline exceeded
 2023/01/26 14:55:29 Terraform apply | 
 2023/01/26 14:55:29 Terraform apply |   on main.tf line 44, in resource "ibm_pi_instance" "instance":
 2023/01/26 14:55:29 Terraform apply |   44: resource "ibm_pi_instance" "instance" {
 2023/01/26 14:55:29 Terraform apply | 
 2023/01/26 14:55:29 Terraform apply | 
 2023/01/26 14:55:29 Terraform APPLY error: Terraform APPLY errorexit status 1
 2023/01/26 14:55:29 Could not execute job: Error : Terraform APPLY errorexit status 1

Expected Behavior

Job finishes after successful modification of VSI params.

Actual Behavior

Params are successfully modified (checked via IBM Cloud web UI), but the script continues to run till timeout and finishes with Failed status.

rmoralesjr commented 1 year ago

What colo did this issue occur? Thanks

gasgithub commented 1 year ago

What do you mean by "colo"? Power Systems Virtual Server workspace is created in Sydney 05.

rmoralesjr commented 1 year ago

The logs show 2 pvm instance create operations recently

Jan 25 09:13:10 syd-servicebroker-58768d98c7-8fxgx syd-servicebroker DEBUG 15:13:10.967Z syd-servicebroker-58768d98c7-8fxgx[8ec374] restapi authorize_util.go:55/prettyPrint â–¶ Service Broker Request: { "RequestType": "POST", "RequestURL": "/pcloud/v1/cloud-instances/a21fa4bc-6989-46ab-b572-17907d27f334/pvm-instances", "CRN": "crn:v1:bluemix:public:power-iaas:syd05:a/b71ac2564ef0b98f1032d189795994dc:a21fa4bc-6989-46ab-b572-17907d27f334::", and

Jan 25 09:56:32 syd-servicebroker-58768d98c7-dsm7v syd-servicebroker DEBUG 15:56:32.83Z syd-servicebroker-58768d98c7-dsm7v[8d5f52] restapi authorize_util.go:55/prettyPrint â–¶ Service Broker Request: { "RequestType": "POST", "RequestURL": "/pcloud/v1/cloud-instances/a21fa4bc-6989-46ab-b572-17907d27f334/pvm-instances", "CRN": "crn:v1:bluemix:public:power-iaas:syd05:a/b71ac2564ef0b98f1032d189795994dc:a21fa4bc-6989-46ab-b572-17907d27f334::",

so is this issue regarding an instance named gas-power or gas-rhel8 or something different?

Also there are no log lines with "PUT pvm-instances a21fa4bc-6989-46ab-b572-17907d27f334" so that means no update pvm instance operations were received for that cloud instance which is not consistent with the problem description.

gasgithub commented 1 year ago

The latest instance I'm playing with is gas-terra-rhel-2 , which in web console console has ID: b03f3c0c-e27d-43db-8bac-9aed05f2668a

It was created via schematics job id: 4360b71a-ba0b-43a4-a5de-1432049cdc48

2023/01/26 13:16:36 Request: activitId=121d39c91af002b5ba71208f5c29cff8, account=b71ac2564ef0b98f1032d189795994dc, owner=xyz, requestID=4360b71a-ba0b-43a4-a5de-1432049cdc48
...
 2023/01/26 13:27:35 Terraform apply | ibm_pi_instance.instance: Creation complete after 9m44s [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a]
 2023/01/26 13:27:35 Terraform apply | data.ibm_pi_instance.data_source_instance: Reading...
 2023/01/26 13:27:38 Terraform apply | data.ibm_pi_instance.data_source_instance: Read complete after 3s [id=b03f3c0c-e27d-43db-8bac-9aed05f2668a]
 2023/01/26 13:27:38 Terraform apply | 
 2023/01/26 13:27:38 Terraform apply | Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
 2023/01/26 13:27:38 Command finished successfully.

Then it was modified by job following job, which failed:

2023/01/26 13:54:36 -----  New Workspace Action  -----
 2023/01/26 13:54:36 Request: activitId=695d1900885264021bd4067d4163a0f3, account=b71ac2564ef0b98f1032d189795994dc, owner=xyz, requestID=ab198ada-0478-45fa-9193-6f8a1d839322
 2023/01/26 13:54:36 Related Activity: action=APPLY, workspaceID=eu-de.workspace.power-tests.f382857d, processedBy=orchestrator-698ff5c987-gbl5d
 2023/01/26 13:54:36 Related Workspace: name=power-tests, sourcerelease=(not specified), sourceurl=https://github.com/gasgithub/terraform-plans, folder=ibm-power
....
 2023/01/26 13:55:02 Terraform init | Terraform has been successfully initialized!
 2023/01/26 13:55:02 Command finished successfully.

 2023/01/26 13:55:02 -----  Terraform APPLY  -----

 2023/01/26 13:55:02 Starting command: terraform13 apply -state=terraform.tfstate -var-file=schematics.tfvars -auto-approve -no-color -lock=false
 2023/01/26 13:55:02 Starting command: terraform13 apply -state=terraform.tfstate -var-file=schematics.tfvars -auto-approve -no-color -lock=false
 2023/01/26 13:55:11 Terraform apply | random_id.name1: Refreshing state... [id=BTw]
 2023/01/26 13:55:12 Terraform apply | data.ibm_pi_key.data_source_key: Refreshing state... [id=gas-ssh]
 2023/01/26 13:55:12 Terraform apply | data.ibm_pi_image.data_source_image: Refreshing state... [id=daa36e53-5974-498d-8a48-8528c853ede1]
 2023/01/26 13:55:12 Terraform apply | ibm_pi_network.network[0]: Refreshing state... [id=a21fa4bc-6989-46ab-b572-17907d27f334/4f090ab6-dbf6-42a5-b527-fabcb005859b]
 2023/01/26 13:55:14 Terraform apply | data.ibm_pi_public_network.data_source_network: Refreshing state... [id=4f090ab6-dbf6-42a5-b527-fabcb005859b]
 2023/01/26 13:55:17 Terraform apply | ibm_pi_instance.instance: Refreshing state... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a]
 2023/01/26 13:55:20 Terraform apply | data.ibm_pi_instance.data_source_instance: Refreshing state... [id=b03f3c0c-e27d-43db-8bac-9aed05f2668a]
 2023/01/26 13:55:29 Terraform apply | ibm_pi_instance.instance: Modifying... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a]
 2023/01/26 13:55:39 Terraform apply | ibm_pi_instance.instance: Still modifying... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a, 10s elapsed]
....
 2023/01/26 14:55:10 Terraform apply | ibm_pi_instance.instance: Still modifying... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a, 59m40s elapsed]
 2023/01/26 14:55:20 Terraform apply | ibm_pi_instance.instance: Still modifying... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a, 59m50s elapsed]
 2023/01/26 14:55:29 Terraform apply | 
 2023/01/26 14:55:29 Terraform apply | Error: context deadline exceeded
 2023/01/26 14:55:29 Terraform apply | 
 2023/01/26 14:55:29 Terraform apply |   on main.tf line 44, in resource "ibm_pi_instance" "instance":
 2023/01/26 14:55:29 Terraform apply |   44: resource "ibm_pi_instance" "instance" {
 2023/01/26 14:55:29 Terraform apply | 
 2023/01/26 14:55:29 Terraform apply | 
 2023/01/26 14:55:29 Terraform APPLY error: Terraform APPLY errorexit status 1
 2023/01/26 14:55:29 Could not execute job: Error : Terraform APPLY errorexit status 1
rmoralesjr commented 1 year ago

The logs show the account ID: b71ac2564ef0b98f1032d189795994dc, cloud instance ID: a21fa4bc-6989-46ab-b572-17907d27f334, but not the instance ID: b03f3c0c-e27d-43db-8bac-9aed05f2668a Also there are no instances with the pi_instance_name field of gas-terra-rhel-2. This is usual in that the reported instance ID and name are not present. I am looking at the syd05 production data center service broker logs. We'll continue to investigate.

gasgithub commented 1 year ago

Please check the update from IBM support:

Instance ID: b03f3c0c-e27d-43db-8bac-9aed05f2668a | Instance Name: gas-terra-rhel-2 

Jan 26 13:55:37(UTC) kube-bs6u6j9s0qrbkqbmeb9g-poweriaassy-default-00000161 syd-servicebroker-58768d98c7-8fxgx.log
 power-iaas: update pvm-instance cbaa3ab004f344a2882f3b30ff39e814/b03f3c0c-e27d-43db-8bac-9aed05f2668a success 
 {"memory":4,"pinPolicy":"none","procType":"shared","processors":1,"serverName":"gas-terra-rhel-2",
"statusUrl":"/pcloud/v1/cloud-instances/cbaa3ab004f344a2882f3b30ff39e814/pvm-instances/b03f3c0c-e27d-43db-8bac-9aed05f2668a"}

Based on the log above, the operation was a success and returned a 202 response code. Based on this, we assume the issue lies with terraform provider. 
rmoralesjr commented 1 year ago

I see the logs for syd-servicebroker-58768d98c7-8fxgx pod but my search for text string "b03f3c0c-e27d-43db-8bac-9aed05f2668a" is still not finding a match. As far as the 202 return code it means the service broker accepted the request but the async update work was still probably going on. After receiving the 202 code the terraform provider code will then poll for the instance's status until finds certain status before it continues. It is waiting for the instance's status to be Available (and health status of OK) state or Error state and this is what is probably timing out. My guess is the instance is stuck is a health state that is not OK. Ask whoever from IBM support that reported the details above if they can see the relevant service brokers logs to see what the instance's status and health status are.