IBM / python-sdk-core

The python-sdk-core repository contains core functionality required by Python code generated by the IBM OpenAPI SDK Generator.
Apache License 2.0
20 stars 27 forks source link

TLS handshake error #144

Closed sstegmueller closed 1 year ago

sstegmueller commented 2 years ago

We use IBM Python SDK Core to update our database periodically. Unfortunately, the following SSL error occurs at irregular intervals, and so far we have not been able to figure out why this happens:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1040, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 414, in connect
    self.sock = ssl_wrap_socket(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/local/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:1131)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 785, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='iam.cloud.ibm.com', port=443): Max retries exceeded with url: /identity/token (Caused by SSLError(SSLError(1, '[SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:1131)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/www/ibm/client.py", line 110, in describe_hosted_connections
    self.direct_link_provider.list_provider_gateways().get_result()
  File "/usr/local/lib/python3.8/site-packages/ibm_cloud_networking_services/direct_link_provider_v2.py", line 137, in list_provider_gateways
    request = self.prepare_request(method='GET',
  File "/usr/local/lib/python3.8/site-packages/ibm_cloud_sdk_core/base_service.py", line 409, in prepare_request
    self.authenticator.authenticate(request)
  File "/usr/local/lib/python3.8/site-packages/ibm_cloud_sdk_core/authenticators/iam_request_based_authenticator.py", line 63, in authenticate
    bearer_token = self.token_manager.get_token()
  File "/usr/local/lib/python3.8/site-packages/ibm_cloud_sdk_core/token_managers/token_manager.py", line 81, in get_token
    self.paced_request_token()
  File "/usr/local/lib/python3.8/site-packages/ibm_cloud_sdk_core/token_managers/token_manager.py", line 131, in paced_request_token
    token_response = self.request_token()
  File "/usr/local/lib/python3.8/site-packages/ibm_cloud_sdk_core/token_managers/iam_request_based_token_manager.py", line 115, in request_token
    response = self._request(
  File "/usr/local/lib/python3.8/site-packages/ibm_cloud_sdk_core/token_managers/jwt_token_manager.py", line 93, in _request
    response = requests.request(
  File "/usr/local/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 517, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='iam.cloud.ibm.com', port=443): Max retries exceeded with url: /identity/token (Caused by SSLError(SSLError(1, '[SSL: TLSV1_ALERT_INTERNAL_ERROR] tlsv1 alert internal error (_ssl.c:1131)')))

We have already updated the library from 3.5.2 to version 3.15.3, hoping that #139 will solve the problem, but no luck here.

We recently enabled verbose logging, which is recommended in the project's readme. We also added logging of the IP address of the receiving server, as we suspect that perhaps some specific IBM machines are causing the problems. Since adding the verbose logging, the incident has not occurred, but I will update this issue with new information.

Environment Docker Image=python:3.8-slim-buster Python Version=3.8.13 ibm-cloud-networking-services=0.17.2 ibm-cloud-sdk-core=3.15.3 api_date_version=2020-06-02

I would be grateful for any help.

pyrooka commented 2 years ago

Hi @StefanStegmueller! Do you have any updates on this issue? New occurrences, logs, etc. I've looked into it, but haven't found any solution yet. Some say proxies can randomly screw up certificates/handshake in rare cases, but I think that’s not the case here. I suspect something is happening with the certificates (obviously…), but I don’t really know what. Besides the logs, the OpenSSL and TLS version might be useful as well.

padamstx commented 2 years ago

A google search using python tlsv1 alert internal error (_ssl.c:1131) yields a few stackoverflow articles that seem to be similar to the problem reported above. One possible reason for the error is a mismatch between acceptable TLS versions on the client vs server. Not sure if this is applicable or not.

pyrooka commented 2 years ago

Yeah, I found that too. But in that case the connection shouldn't even be established once, because this is not a thing that changes often. Or am I missing something?

padamstx commented 2 years ago

Yeah, I found that too. But in that case the connection shouldn't even be established once, because this is not a thing that changes often. Or am I missing something?

I think it depends on how the server side is implemented. Is it a single endpoint or is it multiple server instances fronted by a gateway, and perhaps not all the server instances are configured correctly. It might not be likely, but it's possible. And I think if something obvious was causing this, then I think I'd expect to see this problem reported before now.

pyrooka commented 2 years ago

Hmm you could be right. If there is e.g a load balancer configured with SSL passthrough, that can cause this issue. I mean if there is a misconfigured server behind it (as you mentioned).

marsangr commented 2 years ago

Thanks for your contributions. I am with @StefanStegmueller working on debugging this and I can confirm that this is still occurring. The target hostnames that trigger the error

directlink.cloud.ibm.com iam.cloud.ibm.com

do have indeed changing IP addresses in what looks like a DNS-based load distribution. Unfortunately we see no patterns here: the addresses that sometimes fail can also deliver success. There might additionally be some IP anycast or loadbalancer in action behind a certain address here though, as @pyrooka mentions.

Do we have any debugging mechanisms in python-sdk-core to try to pin this down? I'd specially be interested in seeing which specific TLS handshake details drive to the exception. Other ideas welcome.

pyrooka commented 2 years ago

@marsangr @StefanStegmueller First off, sorry for the late answer! Do you have any update on this?

Do we have any debugging mechanisms in python-sdk-core to try to pin this down?

There is nothing else in the core that could help you to track down this issue. It's coming from a lower level so I think all you can do is logging the requests. Maybe manually editing the code and adding some extra debugging (can't think of any right now) could also be helpful. Let me know if you have/need anything!

pyrooka commented 1 year ago

Closing this for lack of activity. Feel free to re-open if the issues still persists.