SUSE / ha-sap-terraform-deployments

Automated SAP/HA Deployments in Public/Private Clouds
GNU General Public License v3.0
122 stars 88 forks source link

GCP S/4HANA2020 DEPLOYMENT FAILED #804

Closed busetde closed 2 years ago

busetde commented 2 years ago

Used cloud platform: GCP

Used SLES4SAP version: SLES4SAP 15 SP2

Used client machine OS: Google Cloud Shell

Expected behavior vs. observed behavior Deployment of S/4HANA2020 failed with error below:

Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[0], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1494841285.sh": Process exited with status 1

`module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [ERROR ] Cannot perform 'http.query': https://gcp-netweaver01:7630/monitor?0 - [Errno 111] Connection refused module.netweaver_node.module.netweaver_provision.null_resource.provision[0]: Still creating... [2h5m11s elapsed] module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [2h5m11s elapsed] module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [ERROR ] Cannot perform 'http.query': https://gcp-netweaver01:7630/monitor?0 - [Errno 111] Connection refused module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [ERROR ] Cannot perform 'http.query': https://gcp-netweaver01:7630/monitor?0 - [Errno 111] Connection refused module.netweaver_node.module.netweaver_provision.null_resource.provision[0]: Still creating... [2h5m21s elapsed] module.netweaver_node.module.netweaver_provision.null_resource.provision[1]: Still creating... [2h5m21s elapsed] module.netweaver_node.module.netweaver_provision.null_resource.provision[1] (remote-exec): [ERROR ] Cannot perform 'http.query': https://gcp-netweaver01:7630/monitor?0 - [Errno 111] Connection refused '

How to reproduce: Using the current master branch, start a new S/4HANA deployment with netweaver_product_id = "S4HANA2020.CORE.HDB.ABAPHA"

Troubleshooting steps: -Attached salt_result.log, sapinst.log, and sapinst_dev.log of gcp-netweaver01

Regards - Budi

busetde commented 2 years ago

@ab-mohamed

yeoldegrove commented 2 years ago

@busetde In your logs I can see a possible problem and solution for the issue.

sapinst.log

  The step checkBackupLocationsHANA with step key |NW_ABAP_DB|ind|ind|ind|ind|0|0|NW_CreateDBandLoad|ind|ind|ind|ind|createdbandload|0|NW_Recovery_Install_HDB|ind|ind|ind|ind|recovery_install_hdb|0|checkBackupLocationsHANA was executed with status ERROR (Last error reported by the step: Caught ESAPinstException in module call: Validator of step '|NW_ABAP_DB|ind|ind|ind|ind|0|0|NW_CreateDBandLoad|ind|ind|ind|ind|createdbandload|0|NW_Recovery_Install_HDB|ind|ind|ind|ind|recovery_install_hdb|0|checkBackupLocationsHANA' reported an error:

This is going to be solved by https://github.com/SUSE/sapnwbootstrap-formula/pull/92.

Please use the latest develop branch of https://github.com/SUSE/ha-sap-terraform-deployments together with ha_sap_deployment_repo = "https://download.opensuse.org/repositories/network:/ha-clustering:/sap-deployments:/devel/" and give it a shot. If this works for you, we can forward port things to the master branch.

busetde commented 2 years ago

@yeoldegrove Follow accordingly, but got error below:

module.netweaver_node.module.netweaver_provision.null_resource.provision[0] (remote-exec): Wed Jan 19 13:56:45 UTC 2022::default-vmnetweaver01::[ERROR] predeployment failed ╷ │ Warning: Version constraints inside provider configuration blocks are deprecated │ │ on infrastructure.tf line 3, in provider "google": │ 3: version = "~> 3.43.0" │ │ Terraform 0.13 and earlier allowed provider version constraints inside the provider configuration block, but │ that is now deprecated and will be removed in a future version of Terraform. To silence this warning, move the │ provider version constraint into the required_providers block. ╵ ╷ │ Error: Error creating ForwardingRule: googleapi: Error 400: Invalid value for field 'resource.IPAddress': '10.0.1.35'. Requested internal IP is outside the network/subnetwork range., invalid │ │ with module.netweaver_node.module.netweaver-load-balancer-ers[0].google_compute_forwarding_rule.load-balancer-forwarding-rule, │ on modules/load_balancer/main.tf line 54, in resource "google_compute_forwarding_rule" "load-balancer-forwarding-rule": │ 54: resource "google_compute_forwarding_rule" "load-balancer-forwarding-rule" { │ ╵ ╷ │ Error: Error creating ForwardingRule: googleapi: Error 400: Invalid value for field 'resource.IPAddress': '10.0.1.34'. Requested internal IP is outside the network/subnetwork range., invalid │ │ with module.netweaver_node.module.netweaver-load-balancer-ascs[0].google_compute_forwarding_rule.load-balancer-forwarding-rule, │ on modules/load_balancer/main.tf line 54, in resource "google_compute_forwarding_rule" "load-balancer-forwarding-rule": │ 54: resource "google_compute_forwarding_rule" "load-balancer-forwarding-rule" { │ ╵ ╷ │ Error: Error creating ForwardingRule: googleapi: Error 400: Invalid value for field 'resource.IPAddress': '10.0.1.20'. Requested internal IP is outside the network/subnetwork range., invalid │ │ with module.drbd_node.module.drbd-load-balancer[0].google_compute_forwarding_rule.load-balancer-forwarding-rule, │ on modules/load_balancer/main.tf line 54, in resource "google_compute_forwarding_rule" "load-balancer-forwarding-rule": │ 54: resource "google_compute_forwarding_rule" "load-balancer-forwarding-rule" { │ ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[1], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1740884027.sh": Process exited with status 1 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[2], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_252719069.sh": Process exited with status 1 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[0], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1131825636.sh": Process exited with status 1

In my terraform.tfvars:

netweaver_app_server_count = 1
netweaver_ha_enabled = true

drbd_ips = ["10.0.0.20", "10.0.0.21"]
drbd_cluster_vip = "10.0.1.20"

netweaver_ips = ["10.0.0.30", "10.0.0.31", "10.0.0.32", "10.0.0.33"]
netweaver_virtual_ips = ["10.0.1.34", "10.0.1.35", "10.0.1.36", "10.0.1.37"]

So virtual IP 10.0.1.34 and 10.0.1.35 got error but 10.0.1.36 got created [Image for 10.0.1.36 route attached] image

Any advise?

yeoldegrove commented 2 years ago

@busetde since https://github.com/SUSE/ha-sap-terraform-deployments/pull/793 drbd_cluster_vip_mechanism and netweaver_cluster_vip_mechanism are set to load-balancer by default.

This means that virtual IPs now have to be on the same subnet or you need to switch back to route again.

Details can also be found in terraform.tfvars.

e.g.

#netweaver_ips = ["10.0.0.30", "10.0.0.31", "10.0.0.32", "10.0.0.33"]

# If "netweaver_cluster_vip_mechanism" is "load-balancer", the ASCS/ERS IP addresses must belong to the same subnet as the netweaver machines
#netweaver_virtual_ips = ["10.0.0.34", "10.0.0.35", "10.0.1.36", "10.0.1.37"]
# If "netweaver_cluster_vip_mechanism" is "route", the ALL netweaver IP addresses must NOT belong to the same subnet as the netweaver machines
#netweaver_virtual_ips = ["10.0.1.34", "10.0.1.35", "10.0.0.36", "10.0.0.37"]

Does it work if change the IPs or the mechanism?

busetde commented 2 years ago

Hi @yeoldegrove,

Thanks, move on from the above issues. But apparently there's following issues as below: Let me know if you need additional information.

module.drbd_node.module.drbd_provision.null_resource.provision[0] (remote-exec): Summary for local
module.drbd_node.module.drbd_provision.null_resource.provision[0] (remote-exec): -------------
module.drbd_node.module.drbd_provision.null_resource.provision[0] (remote-exec): Succeeded: 14 (changed=11)
module.drbd_node.module.drbd_provision.null_resource.provision[0] (remote-exec): Failed:     1
module.drbd_node.module.drbd_provision.null_resource.provision[0] (remote-exec): -------------
module.drbd_node.module.drbd_provision.null_resource.provision[0] (remote-exec): Total states run:     15
module.drbd_node.module.drbd_provision.null_resource.provision[0] (remote-exec): Total run time:   51.851 s
module.drbd_node.module.drbd_provision.null_resource.provision[0] (remote-exec): Thu Feb 10 12:27:27 UTC 2022::default-vmdrbd01::[ERROR] os setup failed 
╷
│ Warning: Version constraints inside provider configuration blocks are deprecated
│ 
│   on infrastructure.tf line 3, in provider "google":
│    3:   version     = "~> 3.43.0"
│ 
│ Terraform 0.13 and earlier allowed provider version constraints inside the provider
│ configuration block, but that is now deprecated and will be removed in a future version
│ of Terraform. To silence this warning, move the provider version constraint into the
│ required_providers block.
╵
╷
│ Error: remote-exec provisioner error
│ 
│   with module.netweaver_node.module.netweaver_provision.null_resource.provision[1],
│   on ../generic_modules/salt_provisioner/main.tf line 65, in resource "null_resource" "provision":
│   65:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_547932536.sh": Process exited with status 1
╵
╷
│ Error: remote-exec provisioner error
│ 
│   with module.drbd_node.module.drbd_provision.null_resource.provision[1],
│   on ../generic_modules/salt_provisioner/main.tf line 65, in resource "null_resource" "provision":
│   65:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_600692385.sh": Process exited with status 1
╵
╷
│ Error: remote-exec provisioner error
│ 
│   with module.hana_node.module.hana_provision.null_resource.provision[1],
│   on ../generic_modules/salt_provisioner/main.tf line 65, in resource "null_resource" "provision":
│   65:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1231432754.sh": Process exited with status 1
╵
╷
│ Error: remote-exec provisioner error
│ 
│   with module.drbd_node.module.drbd_provision.null_resource.provision[0],
│   on ../generic_modules/salt_provisioner/main.tf line 65, in resource "null_resource" "provision":
│   65:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1199164762.sh": Process exited with status 1
╵
╷
│ Error: remote-exec provisioner error
│ 
│   with module.netweaver_node.module.netweaver_provision.null_resource.provision[0],
│   on ../generic_modules/salt_provisioner/main.tf line 65, in resource "null_resource" "provision":
│   65:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_2086154685.sh": Process exited with status 1
╵
╷
│ Error: remote-exec provisioner error
│ 
│   with module.hana_node.module.hana_provision.null_resource.provision[0],
│   on ../generic_modules/salt_provisioner/main.tf line 65, in resource "null_resource" "provision":
│   65:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1738037983.sh": Process exited with status 1
╵
╷
│ Error: remote-exec provisioner error
│ 
│   with module.netweaver_node.module.netweaver_provision.null_resource.provision[2],
│   on ../generic_modules/salt_provisioner/main.tf line 65, in resource "null_resource" "provision":
│   65:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1867630128.sh": Process exited with status 1
yeoldegrove commented 2 years ago

@busetde additional logs from /var/log/salt* would be needed now to debug this further.

busetde commented 2 years ago

@yeoldegrove Please find additional logs from /var/log/salt* for vmdrbd01, vmdrbd02, vmhana01, vmhana02, vmnetweaver01, vmnetweaver02, vmnetweaver03

Let me know if there's additional information required

yeoldegrove commented 2 years ago

The logs show that this new error is totally unrelated to your original problem "no S4HANA 2020 support" and the next error "new lb/route parameters"....

  ----------
            ID: workaround_payg_new_register
      Function: cmd.run
          Name: /usr/sbin/registercloudguest --force-new
        Result: False
       Comment: Attempt 1: Returned a result of "False", with the following comment: "Command "/usr/sbin/registercloudguest --force-new" run"
                Attempt 2: Returned a result of "False", with the following comment: "Command "/usr/sbin/registercloudguest --force-new" run"
                Command "/usr/sbin/registercloudguest --force-new" run
       Started: 21:45:27.765934
      Duration: 31060.121 ms
       Changes:
                ----------
                pid:
                    5302
                retcode:
                    1
                stderr:
                    Traceback (most recent call last):
                      File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
                        (self._dns_host, self.port), self.timeout, **extra_kw
                      File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 61, in create_connection
                        for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
                      File "/usr/lib64/python3.6/socket.py", line 745, in getaddrinfo
                        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
                    socket.gaierror: [Errno -2] Name or service not known

                    During handling of the above exception, another exception occurred:

                    Traceback (most recent call last):
                      File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
                        chunked=chunked,
                      File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
                        self._validate_conn(conn)
                      File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
                        conn.connect()
                      File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 309, in connect
                        conn = self._new_conn()
                      File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 172, in _new_conn
                        self, "Failed to establish a new connection: %s" % e
                    urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fe72f9ea470>: Failed to establish a new connection: [Errno -2] Name or service not known

                    During handling of the above exception, another exception occurred:

                    Traceback (most recent call last):
                      File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
                        timeout=timeout
                      File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 727, in urlopen
                        method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
                      File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 439, in increment
                        raise MaxRetryError(_pool, url, error or ResponseError(cause))
                    urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='smt-gce.susecloud.net', port=443): Max retries exceeded with url: /connect/systems (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe72f9ea470>: Failed to establish a new connection: [Errno -2] Name or service not known',))

                    During handling of the above exception, another exception occurred:

                    Traceback (most recent call last):
                      File "/usr/sbin/registercloudguest", line 216, in <module>
                        utils.remove_registration_data()
                      File "/usr/lib/python3.6/site-packages/cloudregister/registerutils.py", line 1170, in remove_registration_data
                        'https://%s/connect/systems' % server_name, auth=auth_creds
                      File "/usr/lib/python3.6/site-packages/requests/api.py", line 161, in delete
                        return request('delete', url, **kwargs)
                      File "/usr/lib/python3.6/site-packages/requests/api.py", line 61, in request
                        return session.request(method=method, url=url, **kwargs)
                      File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
                        resp = self.send(prep, **send_kwargs)
                      File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
                        r = adapter.send(request, **kwargs)
                      File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
                        raise ConnectionError(e, request=request)
                    requests.exceptions.ConnectionError: HTTPSConnectionPool(host='smt-gce.susecloud.net', port=443): Max retries exceeded with url: /connect/systems (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe72f9ea470>: Failed to establish a new connection: [Errno -2] Name or service not known',))

The latest issue will be handled here: https://github.com/SUSE/ha-sap-terraform-deployments/issues/812

busetde commented 2 years ago

Hi @yeoldegrove ,

Getting the error as below module.netweaver_node.module.netweaver_provision.null_resource.provision[2] (remote-exec): Summary for local module.netweaver_node.module.netweaver_provision.null_resource.provision[2] (remote-exec): ------------- module.netweaver_node.module.netweaver_provision.null_resource.provision[2] (remote-exec): Succeeded: 41 (changed=34) module.netweaver_node.module.netweaver_provision.null_resource.provision[2] (remote-exec): Failed: 7 module.netweaver_node.module.netweaver_provision.null_resource.provision[2] (remote-exec): ------------- module.netweaver_node.module.netweaver_provision.null_resource.provision[2] (remote-exec): Total states run: 48 module.netweaver_node.module.netweaver_provision.null_resource.provision[2] (remote-exec): Total run time: 4278.050 s module.netweaver_node.module.netweaver_provision.null_resource.provision[2] (remote-exec): Tue Feb 15 16:43:25 UTC 2022::default-vmnetweaver03::[ERROR] deployment failed ╷ │ Warning: Version constraints inside provider configuration blocks are deprecated │ │ on infrastructure.tf line 3, in provider "google": │ 3: version = "~> 3.43.0" │ │ Terraform 0.13 and earlier allowed provider version constraints inside the provider configuration block, but that is │ now deprecated and will be removed in a future version of Terraform. To silence this warning, move the provider │ version constraint into the required_providers block. ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[0], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1924790741.sh": Process exited with status 1 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[2], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1766877353.sh": Process exited with status 1 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.netweaver_node.module.netweaver_provision.null_resource.provision[1], │ on ../generic_modules/salt_provisioner/main.tf line 78, in resource "null_resource" "provision": │ 78: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1584056435.sh": Process exited with status 1

Kindly let me know what log do you need?

Regards - Budi

busetde commented 2 years ago

Hi @yeoldegrove

Succesfully deployed S/4HANA 2020 hence closing the bug. Thanks for your support

image