Open gasgithub opened 1 year ago
What colo did this issue occur? Thanks
What do you mean by "colo"? Power Systems Virtual Server workspace is created in Sydney 05.
The logs show 2 pvm instance create operations recently
Jan 25 09:13:10 syd-servicebroker-58768d98c7-8fxgx syd-servicebroker DEBUG 15:13:10.967Z syd-servicebroker-58768d98c7-8fxgx[8ec374] restapi authorize_util.go:55/prettyPrint â–¶ Service Broker Request: { "RequestType": "POST", "RequestURL": "/pcloud/v1/cloud-instances/a21fa4bc-6989-46ab-b572-17907d27f334/pvm-instances", "CRN": "crn:v1:bluemix:public:power-iaas:syd05:a/b71ac2564ef0b98f1032d189795994dc:a21fa4bc-6989-46ab-b572-17907d27f334::",
and
Jan 25 09:56:32 syd-servicebroker-58768d98c7-dsm7v syd-servicebroker DEBUG 15:56:32.83Z syd-servicebroker-58768d98c7-dsm7v[8d5f52] restapi authorize_util.go:55/prettyPrint â–¶ Service Broker Request: { "RequestType": "POST", "RequestURL": "/pcloud/v1/cloud-instances/a21fa4bc-6989-46ab-b572-17907d27f334/pvm-instances", "CRN": "crn:v1:bluemix:public:power-iaas:syd05:a/b71ac2564ef0b98f1032d189795994dc:a21fa4bc-6989-46ab-b572-17907d27f334::",
so is this issue regarding an instance named gas-power or gas-rhel8 or something different?
Also there are no log lines with "PUT pvm-instances a21fa4bc-6989-46ab-b572-17907d27f334" so that means no update pvm instance operations were received for that cloud instance which is not consistent with the problem description.
The latest instance I'm playing with is gas-terra-rhel-2
, which in web console console has ID: b03f3c0c-e27d-43db-8bac-9aed05f2668a
It was created via schematics job id: 4360b71a-ba0b-43a4-a5de-1432049cdc48
2023/01/26 13:16:36 Request: activitId=121d39c91af002b5ba71208f5c29cff8, account=b71ac2564ef0b98f1032d189795994dc, owner=xyz, requestID=4360b71a-ba0b-43a4-a5de-1432049cdc48
...
2023/01/26 13:27:35 Terraform apply | ibm_pi_instance.instance: Creation complete after 9m44s [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a]
2023/01/26 13:27:35 Terraform apply | data.ibm_pi_instance.data_source_instance: Reading...
2023/01/26 13:27:38 Terraform apply | data.ibm_pi_instance.data_source_instance: Read complete after 3s [id=b03f3c0c-e27d-43db-8bac-9aed05f2668a]
2023/01/26 13:27:38 Terraform apply |
2023/01/26 13:27:38 Terraform apply | Apply complete! Resources: 3 added, 0 changed, 0 destroyed.
2023/01/26 13:27:38 Command finished successfully.
Then it was modified by job following job, which failed:
2023/01/26 13:54:36 [1m----- New Workspace Action -----[21m[0m
2023/01/26 13:54:36 Request: activitId=695d1900885264021bd4067d4163a0f3, account=b71ac2564ef0b98f1032d189795994dc, owner=xyz, requestID=ab198ada-0478-45fa-9193-6f8a1d839322
2023/01/26 13:54:36 Related Activity: action=APPLY, workspaceID=eu-de.workspace.power-tests.f382857d, processedBy=orchestrator-698ff5c987-gbl5d
2023/01/26 13:54:36 Related Workspace: name=power-tests, sourcerelease=(not specified), sourceurl=https://github.com/gasgithub/terraform-plans, folder=ibm-power
....
2023/01/26 13:55:02 Terraform init | Terraform has been successfully initialized!
2023/01/26 13:55:02 Command finished successfully.
2023/01/26 13:55:02 [1m----- Terraform APPLY -----[21m[0m
2023/01/26 13:55:02 [34mStarting command: terraform13 apply -state=terraform.tfstate -var-file=schematics.tfvars -auto-approve -no-color -lock=false[39m[0m
2023/01/26 13:55:02 Starting command: terraform13 apply -state=terraform.tfstate -var-file=schematics.tfvars -auto-approve -no-color -lock=false
2023/01/26 13:55:11 Terraform apply | random_id.name1: Refreshing state... [id=BTw]
2023/01/26 13:55:12 Terraform apply | data.ibm_pi_key.data_source_key: Refreshing state... [id=gas-ssh]
2023/01/26 13:55:12 Terraform apply | data.ibm_pi_image.data_source_image: Refreshing state... [id=daa36e53-5974-498d-8a48-8528c853ede1]
2023/01/26 13:55:12 Terraform apply | ibm_pi_network.network[0]: Refreshing state... [id=a21fa4bc-6989-46ab-b572-17907d27f334/4f090ab6-dbf6-42a5-b527-fabcb005859b]
2023/01/26 13:55:14 Terraform apply | data.ibm_pi_public_network.data_source_network: Refreshing state... [id=4f090ab6-dbf6-42a5-b527-fabcb005859b]
2023/01/26 13:55:17 Terraform apply | ibm_pi_instance.instance: Refreshing state... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a]
2023/01/26 13:55:20 Terraform apply | data.ibm_pi_instance.data_source_instance: Refreshing state... [id=b03f3c0c-e27d-43db-8bac-9aed05f2668a]
2023/01/26 13:55:29 Terraform apply | ibm_pi_instance.instance: Modifying... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a]
2023/01/26 13:55:39 Terraform apply | ibm_pi_instance.instance: Still modifying... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a, 10s elapsed]
....
2023/01/26 14:55:10 Terraform apply | ibm_pi_instance.instance: Still modifying... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a, 59m40s elapsed]
2023/01/26 14:55:20 Terraform apply | ibm_pi_instance.instance: Still modifying... [id=a21fa4bc-6989-46ab-b572-17907d27f334/b03f3c0c-e27d-43db-8bac-9aed05f2668a, 59m50s elapsed]
2023/01/26 14:55:29 Terraform apply |
2023/01/26 14:55:29 Terraform apply | Error: context deadline exceeded
2023/01/26 14:55:29 Terraform apply |
2023/01/26 14:55:29 Terraform apply | on main.tf line 44, in resource "ibm_pi_instance" "instance":
2023/01/26 14:55:29 Terraform apply | 44: resource "ibm_pi_instance" "instance" {
2023/01/26 14:55:29 Terraform apply |
2023/01/26 14:55:29 Terraform apply |
2023/01/26 14:55:29 [1m[31mTerraform APPLY error: Terraform APPLY errorexit status 1[39m[0m
2023/01/26 14:55:29 [1m[31mCould not execute job: Error : Terraform APPLY errorexit status 1[39m[0m
The logs show the account ID: b71ac2564ef0b98f1032d189795994dc, cloud instance ID: a21fa4bc-6989-46ab-b572-17907d27f334, but not the instance ID: b03f3c0c-e27d-43db-8bac-9aed05f2668a Also there are no instances with the pi_instance_name field of gas-terra-rhel-2. This is usual in that the reported instance ID and name are not present. I am looking at the syd05 production data center service broker logs. We'll continue to investigate.
Please check the update from IBM support:
Instance ID: b03f3c0c-e27d-43db-8bac-9aed05f2668a | Instance Name: gas-terra-rhel-2
Jan 26 13:55:37(UTC) kube-bs6u6j9s0qrbkqbmeb9g-poweriaassy-default-00000161 syd-servicebroker-58768d98c7-8fxgx.log
power-iaas: update pvm-instance cbaa3ab004f344a2882f3b30ff39e814/b03f3c0c-e27d-43db-8bac-9aed05f2668a success
{"memory":4,"pinPolicy":"none","procType":"shared","processors":1,"serverName":"gas-terra-rhel-2",
"statusUrl":"/pcloud/v1/cloud-instances/cbaa3ab004f344a2882f3b30ff39e814/pvm-instances/b03f3c0c-e27d-43db-8bac-9aed05f2668a"}
Based on the log above, the operation was a success and returned a 202 response code. Based on this, we assume the issue lies with terraform provider.
I see the logs for syd-servicebroker-58768d98c7-8fxgx pod but my search for text string "b03f3c0c-e27d-43db-8bac-9aed05f2668a" is still not finding a match. As far as the 202 return code it means the service broker accepted the request but the async update work was still probably going on. After receiving the 202 code the terraform provider code will then poll for the instance's status until finds certain status before it continues. It is waiting for the instance's status to be Available (and health status of OK) state or Error state and this is what is probably timing out. My guess is the instance is stuck is a health state that is not OK. Ask whoever from IBM support that reported the details above if they can see the relevant service brokers logs to see what the instance's status and health status are.
Community Note
Terraform CLI and Terraform IBM Provider Version
Affected Resource(s)
Terraform Configuration Files
Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.
Repo with terrafrom files - https://github.com/gasgithub/terraform-plans/tree/main/ibm-power
Steps to reproduce the issue
1.Create Schematic workspace using github repo 2.Run terraform to create instance 3.Change props in schematic workspace modifying CPU amount. 4.Generate/apply new plan ===> this hangs till timeout failing the Apply job
Debug Output
Expected Behavior
Job finishes after successful modification of VSI params.
Actual Behavior
Params are successfully modified (checked via IBM Cloud web UI), but the script continues to run till timeout and finishes with Failed status.