exasol / pyexasol

Exasol Python driver with low overhead, fast HTTP transport and compression
MIT License
72 stars 39 forks source link

Connection from pyexasol to Exasol SaaS #91

Closed ghost closed 2 years ago

ghost commented 2 years ago

I cannot connect to Exasol SaaS using this syntax:

C = pyexasol.connect(dsn="DB_connection_string",
                     user="DB_user",
                     refresh_token="DB_PAT"
                     )

but using "password" instead of "refresh_token" works fine

C = pyexasol.connect(dsn="DB_connection_string",
                     user="DB_user",
                     password="DB_PAT"
                     )
littleK0i commented 2 years ago

What is the error message? Could you also try to enable debug=True and post logs here?

Let's take a look.

ghost commented 2 years ago

Error Message: Could not connect to Exasol: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1124)

debug=True does only repeat this error message multiple times

littleK0i commented 2 years ago

Aha, got it. Using refresh_token enables proper certificate verification. https://github.com/exasol/pyexasol/blob/master/pyexasol/connection.py#L700-L703

Is Exasol SAAS certificate valid in the first place? Does it work with JDBC / ODBC drivers?

ghost commented 2 years ago

Seems as if the certificate could not be found on mac. I managed to connect if I also use certifi and ssl, maybe that helps to understand the root cause

import pyexasol
import certifi
import ssl

C = pyexasol.connect(dsn=conf.get("DB_connection_string"),
                     user=conf.get("DB_user"),
                     refresh_token=conf.get("DB_PAT"),
                     websocket_sslopt = {
                        "cert_reqs": ssl.CERT_REQUIRED,
                        "ca_certs": certifi.where()
                        }
                     )
littleK0i commented 2 years ago

Is JDBC / ODBC working on this machine with Exasol SAAS?

In theory, it might be a problem if system default certificate authority does not recognise SSL certificates generated for Exasol SAAS.

Also, if we enforce certifi, it will potentially mess up with any custom "on premise" SSL setups which many people might currently have.

ghost commented 2 years ago

I can connect using DBeaver on the same machine

littleK0i commented 2 years ago

I've checked it with a small SaaS testing cluster and did not detect any issues.

Also, I've checked certificates returned by Exasol:

{   'OCSP': ('http://r3.o.lencr.org',),
    'caIssuers': ('http://r3.i.lencr.org/',),
    'issuer': ((('countryName', 'US'),), (('organizationName', "Let's Encrypt"),), (('commonName', 'R3'),)),
    'notAfter': 'Jun 15 10:13:31 2022 GMT',
    'notBefore': 'Mar 17 10:13:32 2022 GMT',
    'serialNumber': '03E49B62D405173CF0B2F41BABB9D58290B8',
    'subject': ((('commonName', '*.clusters.exasol.com'),),),
    'subjectAltName': (('DNS', '*.clusters.exasol.com'),),
    'version': 3}
depth=2 C = US, O = Internet Security Research Group, CN = ISRG Root X1
verify return:1
depth=1 C = US, O = Let's Encrypt, CN = R3
verify return:1
depth=0 CN = *.clusters.exasol.com
verify return:1

The whole chain looks OK at the first glance.

I suspect you might be having problems specifically with your Python or OS installation. In theory, all other hosts using "Let's Encrypt" should not work for you and return the same error.

Could you try to connect to another host using standard Python?

For example, this host: https://letsencrypt.org/ And this Python package: https://docs.python.org/3/library/urllib.request.html#module-urllib.request

Also, could you ask a colleague to try to reproduce this error? I think, another machine will be ok.

littleK0i commented 2 years ago

A sample code I've used for testing, including hostname, but not credentials.

import pyexasol
import ssl

import pprint
printer = pprint.PrettyPrinter(indent=4, width=140)

with pyexasol.connect(dsn='p34ibbbqujamtclijzpo5pmsni.clusters.exasol.com', user='xxx', refresh_token='xxx') as C:
    # Basic select
    stmt = C.execute("SELECT * FROM exa_all_users LIMIT 5")
    printer.pprint(stmt.fetchall())
luchughes commented 2 years ago

@exatobr @littleK0i did you find a solution to this? I have a similar issue in a client's network environment where any of the pyexasol http transport methods just hang indefinitely or time out. execute() works fine and Dbeaver works fine from the same machine but any method like export_to_pandas() fails.

This is a real headscratcher to me. Does http transport utilise a different port or something?

  "driverName": "PyEXASOL 0.24.0",
  "clientName": "PyEXASOL",
  "clientVersion": "0.24.0",
  "clientOs": "Windows-10-10.0.19044-SP0",
  "clientOsUsername": "lhughes",
  "clientRuntime": "Python 3.9.12",
  "useCompression": true,
2022-07-14 17:49:09.432 [WebSocket request #3]
{
  "command": "getAttributes"
}
2022-07-14 17:49:09.549 [WebSocket response #3]
{
  "status": "ok",
  "attributes": {
    "datetimeFormat": "YYYY-MM-DD HH24:MI:SS.FF6",
    "dateFormat": "YYYY-MM-DD",
    "numericCharacters": ".,",
    "dateLanguage": "ENG",
    "queryTimeout": 0,
    "timezone": "EUROPE/LONDON",
    "timezoneBehavior": "INVALID SHIFT AMBIGUOUS ST",
    "snapshotTransactionsEnabled": true,
    "defaultLikeEscapeCharacter": "\\",
    "autocommit": true,
    "compressionEnabled": true,
    "currentSchema": "ANALYTICS",
    "openTransaction": 0
  }
}
2022-07-14 17:49:09.749 [WebSocket request #4]
{
  "command": "execute",
  "sqlText": "EXPORT analytics.test_for_transport INTO CSV\nAT 'https://192.169.143.58:34045' FILE '000.gz'\nWITH COLUMN NAMES"
}

This is the point where it hangs.

littleK0i commented 2 years ago

It looks like another issue, not related to certificates.

Single-threaded HTTP transport uses the same port to connect: https://github.com/exasol/pyexasol/blob/8b3d0f2a483fab9383ea5987e89fcba73cdf17a5/pyexasol/connection.py#L305

The only difference is that it uses IP address to connect to specific node instead of hostname.

Could you try running the same code from different machine? Maybe Mac / Linux, or from an actual server. With Windows too many things can go wrong.

luchughes commented 2 years ago

Thanks for the very quick response! I've tried from 2 different machines within the client network and get the same result in both, but I did try from another laptop on a different network and it works:

2022-07-14 17:49:27.065 [WebSocket request #3]
{
  "command": "getAttributes"
}
2022-07-14 17:49:27.178 [WebSocket response #3]
{
  "status": "ok",
  "attributes": {
    "datetimeFormat": "YYYY-MM-DD HH24:MI:SS.FF6",
    "dateFormat": "YYYY-MM-DD",
    "numericCharacters": ".,",
    "dateLanguage": "ENG",
    "queryTimeout": 0,
    "timezone": "EUROPE/LONDON",
    "timezoneBehavior": "INVALID SHIFT AMBIGUOUS ST",
    "snapshotTransactionsEnabled": true,
    "defaultLikeEscapeCharacter": "\\",
    "autocommit": true,
    "compressionEnabled": true,
    "currentSchema": "ANALYTICS",
    "openTransaction": 0
  }
}
2022-07-14 17:49:27.305 [WebSocket request #4]
{
  "command": "execute",
  "sqlText": "EXPORT (\nSelect * from test_for_transport\n) INTO CSV\nAT 'https://192.169.143.25:38263' FILE '000.gz'\nWITH COLUMN NAMES"
}
2022-07-14 17:49:28.618 [WebSocket response #4]
{
  "status": "ok",
  "responseData": {
    "results": [
      {
        "resultType": "rowCount",
        "rowCount": 3
      }
    ],
    "numResults": 1
  }
}
2022-07-14 17:49:28.622 [WebSocket request #5]
{
  "command": "disconnect"
}
2022-07-14 17:49:28.655 [WebSocket response #5]
{
  "status": "ok"
}
2022-07-14 17:49:28.655 [WebSocket connection close]

I'm a bit baffled as all machines have IPv4 addresses in whitelisted ranges and ODBC and pyexasol execute() behaviour is identical, it's just http transport that is having this issue.

luchughes commented 2 years ago

Also, I ran from a kubernetes pod CentOS linux environment also within the client's network and got the same issue to the first one (although here I have to use a http proxy and it seems to hang slightly earlier on in the process.)

littleK0i commented 2 years ago

How about exajload or standard Exasol JDBC driver? Does it work for IMPORT (from local file)? Internally, JDBC driver should use the same approach.

What if you try to connect to Exasol using IPv4 address in DSN instead of hostname?

Does client have some fancy "protection" software? HTTP transport is a little bit odd. It does not use SSL immediately after connection. Instead, it sends and receives some un-encrypted data, and waits for incoming request from Exasol. It might be considered suspicious for software which tries to analyse network packets and figure out what is going on.

littleK0i commented 2 years ago

Also, you may always install "pyexasol" locally for development using pip install -e . and start debugging around this piece of code: https://github.com/exasol/pyexasol/blob/master/pyexasol/http_transport.py#L369-L382

This is where initial connection happens.

And this is the response handler: https://github.com/exasol/pyexasol/blob/master/pyexasol/http_transport.py#L473-L494

luchughes commented 2 years ago

Thanks for the pointers - I'll continue to debug with these and let you know if I have any success!

luchughes commented 2 years ago

Hi @littleK0i - Still struggling with this one and trying to debug with our network security team.

The IP address and port in the "sqlText": "EXPORT (\nSelect * from test_for_transport\n) INTO CSV\nAT 'https://192.169.143.25:38263' FILE '000.gz'\nWITH COLUMN NAMES" statement seems to vary quite a lot.

Sometimes it's "https://192.169.143.58:39777", or "https://192.169.143.25:44141".

Are these all local to Exasol?

littleK0i commented 2 years ago

This is IP in Exasol private network.

When pyexasol (or JDBC) connects and sends a special packet, Exasol creates some sort of "tunnel" in its private network routed into HTTP transport connection. The address + port of this tunnel is returned to client and is used while building IMPORT or EXPORT sql query.

Normally, you may ignore these addresses. Actual public network connectivity uses public IP address and port.

luchughes commented 2 years ago

Ah okay, that's really helpful clarification - thanks.

ghost commented 2 years ago

I suggest to close this issue

littleK0i commented 2 years ago

Let's close it, I agree. Let's encrypt root certs should be largely available on all machines nowadays.

I don't think it will be widespread issue. But if it does, we can always reopen.