SUSE / ha-sap-terraform-deployments

Automated SAP/HA Deployments in Public/Private Clouds
GNU General Public License v3.0
122 stars 88 forks source link

BYOS/PAYG public-cloud images do not register correctly #812

Closed yeoldegrove closed 2 years ago

yeoldegrove commented 2 years ago

Used cloud platform AWS, GCP, Azure

Used SLES4SAP version e.g. latest SLES15SP3

Expected behaviour vs observed behaviour

The infrastructure to register SUSE's BYOS images changed. For details look at: https://www.suse.com/c/byos-instances-and-the-suse-public-cloud-update-infrastructure/

At the moment, BYOS images throw errors in the os_setup phase (see below).

Update: PAYG images also seem to be affected.

How to reproduce Deploy any AWS, GCP, Azure BYOS or PAYG image.

e.g.

os_image = "suse-sap-cloud/sles-15-sp3-sap"
vs.
os_image = "suse-byos-cloud/sles-15-sp3-sap-byos"

Logs

  ----------
            ID: workaround_payg_new_register
      Function: cmd.run
          Name: /usr/sbin/registercloudguest --force-new
        Result: False
       Comment: Attempt 1: Returned a result of "False", with the following comment: "Command "/usr/sbin/registercloudguest --force-new" run"
                Attempt 2: Returned a result of "False", with the following comment: "Command "/usr/sbin/registercloudguest --force-new" run"
                Command "/usr/sbin/registercloudguest --force-new" run
       Started: 21:45:27.765934
      Duration: 31060.121 ms
       Changes:
                ----------
                pid:
                    5302
                retcode:
                    1
                stderr:
                    Traceback (most recent call last):
                      File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
                        (self._dns_host, self.port), self.timeout, **extra_kw
                      File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 61, in create_connection
                        for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
                      File "/usr/lib64/python3.6/socket.py", line 745, in getaddrinfo
                        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
                    socket.gaierror: [Errno -2] Name or service not known

                    During handling of the above exception, another exception occurred:

                    Traceback (most recent call last):
                      File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
                        chunked=chunked,
                      File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
                        self._validate_conn(conn)
                      File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
                        conn.connect()
                      File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 309, in connect
                        conn = self._new_conn()
                      File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 172, in _new_conn
                        self, "Failed to establish a new connection: %s" % e
                    urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fe72f9ea470>: Failed to establish a new connection: [Errno -2] Name or service not known

                    During handling of the above exception, another exception occurred:

                    Traceback (most recent call last):
                      File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
                        timeout=timeout
                      File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 727, in urlopen
                        method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
                      File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 439, in increment
                        raise MaxRetryError(_pool, url, error or ResponseError(cause))
                    urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='smt-gce.susecloud.net', port=443): Max retries exceeded with url: /connect/systems (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe72f9ea470>: Failed to establish a new connection: [Errno -2] Name or service not known',))

                    During handling of the above exception, another exception occurred:

                    Traceback (most recent call last):
                      File "/usr/sbin/registercloudguest", line 216, in <module>
                        utils.remove_registration_data()
                      File "/usr/lib/python3.6/site-packages/cloudregister/registerutils.py", line 1170, in remove_registration_data
                        'https://%s/connect/systems' % server_name, auth=auth_creds
                      File "/usr/lib/python3.6/site-packages/requests/api.py", line 161, in delete
                        return request('delete', url, **kwargs)
                      File "/usr/lib/python3.6/site-packages/requests/api.py", line 61, in request
                        return session.request(method=method, url=url, **kwargs)
                      File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
                        resp = self.send(prep, **send_kwargs)
                      File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
                        r = adapter.send(request, **kwargs)
                      File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
                        raise ConnectionError(e, request=request)
                    requests.exceptions.ConnectionError: HTTPSConnectionPool(host='smt-gce.susecloud.net', port=443): Max retries exceeded with url: /connect/systems (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe72f9ea470>: Failed to establish a new connection: [Errno -2] Name or service not known',))
busetde commented 2 years ago

@yeoldegrove - BYOS means not using cloud provider license? If this correct, then my deployment actually using GCP SLES4SAP provided license, so is the understanding correct?

Regards - Budi

yeoldegrove commented 2 years ago

@busetde Yes, you are correct. This is basically this parameter.

os_image = "suse-sap-cloud/sles-15-sp3-sap"
os_image = "suse-sap-cloud/sles-15-sp3-sap-v20211113"
vs.
os_image = "suse-byos-cloud/sles-15-sp3-sap-byos"
os_image = "suse-byos-cloud/sles-15-sp3-sap-byos-v20211113"

Could you supply the exact image version you are using? This way, I can make sure to include it in my tests.

My current assumption is, that the workaround of removing all registration files and doing a registercloudguest --force-new is not needed anymore... The cleanup before the workaround would also affect you, if you use a GCP provided license. https://github.com/SUSE/ha-sap-terraform-deployments/blob/e38064dbd4e102e47c7065262ac215935b5e9a20/salt/os_setup/registration.sls#L66-L81

busetde commented 2 years ago

@yeoldegrove,

I'm using below *_os_image= "suse-sap-cloud/sles-15-sp2-sap"

Hope that helps.

yeoldegrove commented 2 years ago

@busetde should be fixed by #815 is develop now.

busetde commented 2 years ago

Hi @yeoldegrove,

Supposedly in main.tf there's line mentioned that there's IP address need to be outside the subnetwork CIDR range.

As there's an error as below:

│ Error: Error creating instance: googleapi: Error 400: Invalid value for field 'resource.networkInterfaces[0].networkIP': '10.0.1.32'. Requested internal IP is outside the subnetwork CIDR range., invalid │ │ with module.netweaver_node.google_compute_instance.netweaver[2], │ on modules/netweaver_node/main.tf line 108, in resource "google_compute_instance" "netweaver": │ 108: resource "google_compute_instance" "netweaver" {

Regards - Budi

yeoldegrove commented 2 years ago

@busetde The error messages come directly from the google API. This looks like the the loadbalancer vs. route config change again that I explained in https://github.com/SUSE/ha-sap-terraform-deployments/issues/804#issuecomment-1033810226. Is it and can it be solved by "using loadbalancer and changing IPs" or "switching back to route"?

busetde commented 2 years ago

Hi Eike,

Am experienced the error previously and change or follow accordingly based on main.tf in develop branch before the changes of PAYG and not seeing the error, but now I saw the code in develop change that it is the same as master branch.

Could you confirm if there is such changes?

Regards - Budi

On Tue, Feb 15, 2022, 3:50 PM Eike Waldt @.***> wrote:

@busetde https://github.com/busetde The error messages come directly from the google API. This looks like the the loadbalancer vs. route config change again that I explained in #804 (comment) https://github.com/SUSE/ha-sap-terraform-deployments/issues/804#issuecomment-1033810226 . Is it and can it be solved by "using loadbalancer and changing IPs" or "switching back to route"?

— Reply to this email directly, view it on GitHub https://github.com/SUSE/ha-sap-terraform-deployments/issues/812#issuecomment-1040011520, or unsubscribe https://github.com/notifications/unsubscribe-auth/APGC26FYC2V3U5BUS443PGTU3IHTNANCNFSM5ODPH73A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

yeoldegrove commented 2 years ago

master branch does not yet have the fixes from #815 yet. There is an open PR #816 for this. So the "image issue" is still there in master.