SUSE / ha-sap-terraform-deployments

Automated SAP/HA Deployments in Public/Private Clouds
GNU General Public License v3.0
122 stars 88 forks source link

Deployment in GCP error #860

Closed busetde closed 2 years ago

busetde commented 2 years ago

@yeoldegrove - Apologize to create new issues here, am trying to deploy but got error below: Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[3], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_638753395.sh": Process exited with status 1 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.hana_node.module.hana_provision.null_resource.provision[1], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_873561226.sh": Process exited with status 1 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[0], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1573237755.sh": Process exited with status 1 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[2], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_278602173.sh": Process exited with status 1 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[1], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1649281273.sh": Process exited with status 1 ╵

Could you please kindly advise?

Regards - Budi

yeoldegrove commented 2 years ago

@busetde I again would need a terraform.tfvars and some more logs from you. The logs above do not show any concrete error message.

busetde commented 2 years ago

@yeoldegrove - Please kindly find below the terraform.tfvars

Let me know if there's any information needed...

yeoldegrove commented 2 years ago

@busetde What is the error message you get? The messages above are a bit generic ;)

busetde commented 2 years ago

@yeoldegrove - any particular log files that I need to provide?

yeoldegrove commented 2 years ago

@busetde The complete salt output would be enough for the beginning.

busetde commented 2 years ago

@yeoldegrove - I've run again the deployment below is the error: module.netweaver_node.module.netweaver_provision.null_resource.provision[0] (remote-exec): Summary for local module.netweaver_node.module.netweaver_provision.null_resource.provision[0] (remote-exec): ------------- module.netweaver_node.module.netweaver_provision.null_resource.provision[0] (remote-exec): Succeeded: 31 (changed=26) module.netweaver_node.module.netweaver_provision.null_resource.provision[0] (remote-exec): Failed: 14 module.netweaver_node.module.netweaver_provision.null_resource.provision[0] (remote-exec): ------------- module.netweaver_node.module.netweaver_provision.null_resource.provision[0] (remote-exec): Total states run: 45 module.netweaver_node.module.netweaver_provision.null_resource.provision[0] (remote-exec): Total run time: 1591.483 s module.netweaver_node.module.netweaver_provision.null_resource.provision[0] (remote-exec): Wed Jun 1 05:13:30 UTC 2022::default-vmnetweaver01::[ERROR] predeployment failed ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[2], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_153627534.sh": Process exited with status 1 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[3], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1989768892.sh": Process exited with status 1 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[0], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1671757330.sh": Process exited with status 1 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[1], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1361496486.sh": Process exited with status 1

I've attached the salt* log from netweaver01

Let me know if there's any log that I can provide

tstaerk commented 2 years ago

Budi, nice to see you here, I privileged you on a to-be guide for Terraform for SUSE for SAP for Google: https://docs.google.com/document/d/1VW30Yg9K1IcYmcAVXC-0M2F2PHugzGYmEMO_mFuFlVI/edit

busetde commented 2 years ago

Hi Thorsten,

Really appreciated for sharing the documentation.

Regards, Budi

yeoldegrove commented 2 years ago

@busetde from your salt-result.log I can see that this was the issue:

  ----------
            ID: wait_until_nfs_is_ready_netweaver_node_sapmnt
      Function: cmd.run
          Name: until nc -zvw5 10.0.0.22 2049;do sleep 30;done
        Result: False
       Comment: Command "until nc -zvw5 10.0.0.22 2049;do sleep 30;done" run
       Started: 04:48:44.733423
      Duration: 1200008.712 ms
       Changes:
                ----------
                pid:
                    3710
                retcode:
                    1
                stderr:
                stdout:
                    until nc -zvw5 10.0.0.22 2049;do sleep 30;done : Timed out after 1200 seconds
  ----------

Wich is a not-working DRBD cluster...

I released https://github.com/SUSE/ha-sap-terraform-deployments/releases/tag/8.1.4 earlier today which let's me successfully deploy on GCP with DRBD enabled. Please try again and let's debug further in case you issue persists.

busetde commented 2 years ago

@yeoldegrove - Likely still got the same error as below:

image

Please kindly advise if there's anything required for troubleshoot...

yeoldegrove commented 2 years ago

@busetde Still trying to reproduce you issue. No luck so far. Is the issue maybe related to the DRBD cluster? Could you check or send the logs?

Another thing... our develop branch already includes some features to use google filestore as backend for HANA scale-out deployments and also netweaver (some code missing). Would you be interested in this feature?

busetde commented 2 years ago

@yeoldegrove - for DRBD logs, what logs required?

yeoldegrove commented 2 years ago

@busetde basically /var/log/salt* from both DRBD nodes.

busetde commented 2 years ago

@yeoldegrove - The deployment still progressing but DRBD deployment is successful like below:

image

Will send /var/log/sal* when deployment finished...

busetde commented 2 years ago

@yeoldegrove - The deployment failed...

image

Will sent you from both DRBD...

busetde commented 2 years ago

@yeoldegrove - Please find attached the DRBD salt logs from both VM.

Let me know if there's anything else...

yeoldegrove commented 2 years ago

@busetde The DRBD deployment seems to be successful from the logfiles.

Does the failed netcat from your screenshot above work?

default-vmnetweaver01:~ # nc -zv 10.0.0.22 2049
Connection to 10.0.0.22 2049 port [tcp/nfs] succeeded!

How does crm_mon -r1 look on the drbd nodes?

busetde commented 2 years ago

@yeoldegrove Here's the crm_mon -r1

From vmdrbd01

image

From vmdrbd02

image

Let me know if there's any information required...

yeoldegrove commented 2 years ago

These error messages are "fine" and related to https://github.com/SUSE/ha-sap-terraform-deployments/issues/839 / https://bugzilla.suse.com/show_bug.cgi?id=1198872.

@busetde What about the netcat command?

busetde commented 2 years ago

@yeoldegrove - it's succeeded as below:

image

Any other information required?

Regards - Budi

yeoldegrove commented 2 years ago

So to sum it up... everything DRBD related is running. So this could be a timing issue.

@busetde To verify you could try tainting the netweaver provisioners and restart the salt run via another apply:

terraform taint "module.netweaver_node.null_resource.netweaver_provisioner[0]"
terraform taint "module.netweaver_node.null_resource.netweaver_provisioner[1]"
terraform taint "module.netweaver_node.null_resource.netweaver_provisioner[2]"
terraform taint "module.netweaver_node.null_resource.netweaver_provisioner[3]"
terraform apply -auto-approve

and/or you could try raising the 1200s/20m timeout here and do a complete new deployment: https://github.com/SUSE/ha-sap-terraform-deployments/blob/main/salt/shared_storage/nfs.sls#L106

20m could be indeed to low for some regions or certain sizing.

busetde commented 2 years ago

@yeoldegrove - with tainting the netweaver still error... Proceeding to destroy, raising to 30m timeout, apply... and will update you...

yeoldegrove commented 2 years ago

@busetde Just raise it to 120m or something to be on the safe site.

busetde commented 2 years ago

@yeoldegrove - I've changed the time and enabled vpc_name previously with # 'vpc_name = "slesnetwork"' Deployment completed successfully...

image

Will close this issues... Likely not because the timeout later will give it a try again...

Thanks @yeoldegrove

yeoldegrove commented 2 years ago

@busetde feel free to raise an issue to raise the timeout when needed and reference this issue.

busetde commented 2 years ago

@yeoldegrove - all good tested with the timeout of 20mins (1200), reason is the wrong configuration on vpc_name and subnet...

Thanks - Budi