IBM-Cloud / hpc-cluster-lsf

IBM Spectrum LSF - IBM Cloud
https://cloud.ibm.com/docs/ibm-spectrum-lsf?topic=ibm-spectrum-lsf-getting-started-tutorial
Apache License 2.0
10 stars 9 forks source link

Time out provisioning LSF cluster in both ibm cloud portal and via CLI #5

Closed marconetto closed 1 year ago

marconetto commented 2 years ago

I've tried to provision LSF cluster in both IBM Cloud web portal and using the instructions in this repo via terraform. For both cases I'm getting this timeout message below. What is the recommendation to provision the cluster?

2022/04/20 18:56:35 Terraform apply | ibm_is_instance.master_candidate[0]: Still creating... [29m40s elapsed] 2022/04/20 18:56:45 Terraform apply | ibm_is_instance.master_candidate[0]: Still creating... [29m50s elapsed] 2022/04/20 18:56:55 Terraform apply | ibm_is_instance.master_candidate[0]: Still creating... [30m0s elapsed] 2022/04/20 18:56:55 Terraform apply | 2022/04/20 18:56:55 Terraform apply | Error: timeout while waiting for state to become 'running, available, failed, ' (last state: 'provisioning', timeout: 30m0s) 2022/04/20 18:56:55 Terraform apply | 2022/04/20 18:56:55 Terraform apply | on vpc.tf line 427, in resource "ibm_is_instance" "master_candidate": 2022/04/20 18:56:55 Terraform apply | 427: resource "ibm_is_instance" "master_candidate" { 2022/04/20 18:56:55 Terraform apply | 2022/04/20 18:56:55 Terraform apply | 2022/04/20 18:56:55 Terraform APPLY error: Terraform APPLY errorexit status 1 2022/04/20 18:56:55 Could not execute job: Error : Terraform APPLY errorexit status 1

I'm using: MacOS: 12.3 terraform: 1.1.7. ibmcloud CLI: 2.6.0+df1953d-2022-03-24T15:19:15+00:00

AugieMena3 commented 2 years ago

hi @marconetto see the information here on getting help and support: https://cloud.ibm.com/docs/ibm-spectrum-lsf?topic=ibm-spectrum-lsf-getting-help-and-support

The error does not seem to indicate an issue with Spectrum LSF. I would suggest calling IBM Support as mentioned there, or opening an IBM Cloud support case directly via the web portal. Please respond here if the issue is not resolved through that.

AugieMena3 commented 2 years ago

hi @marconetto any update on the issue you were seeing?

marconetto commented 2 years ago

Hi @AugieMena3 thanks for asking. Yesterday I tried a few other times and it worked once. What was strange is that when it worked it was really fast (less than 5 minutes). Before contacting support I'm modifying the terraform files to add more timeout specifications for the resources... As terraform is giving up after 30min, I will try to increase this threshold and do some experiments. If that works, I can make a PR here. If it doesn't, I will then check the support link.

AugieMena3 commented 2 years ago

Good to hear! Thanks for that update @marconetto.

AugieMena3 commented 1 year ago

Closing this given it was not reproduced.