dropbox / PyHive

Python interface to Hive and Presto. 🐝
Other
1.67k stars 549 forks source link

SSL presto #203

Open parisni opened 6 years ago

parisni commented 6 years ago

Hi

I am trying to connect to a https presto endpoint, with basic auth. I could'nt find a way to specify a certifficate file, or a pem file. I actually dig into the source code, and the only way I got working was by doing this:

from sqlalchemy.engine import create_engine
from requests.auth import HTTPBasicAuth

engine = create_engine(
        'presto://<user>@<host>:<port>/hive',
        connect_args={
            'protocol': 'https',
            'requests_kwargs':{'auth': HTTPBasicAuth('<user>','<password>'),  'verify':False}
            }
        )

I also tried that :

from sqlalchemy.engine import create_engine
from requests.auth import HTTPBasicAuth

engine = create_engine(
        'presto://<user>@<host>:<port>/hive',
        connect_args={
            'protocol': 'https',
            'requests_kwargs':{'auth': HTTPBasicAuth('<user>','<password>')
,  'verify':True, 'cert':'/path/to/file.pem'}
            }
        )

But I got a ssl error.

Is there any way to pass the certificate ?

Thanks

devinstevenson commented 6 years ago

@parisni The basic usage over HTTPS is

from sqlalchemy.engine import create_engine
engine = create_engine(
        'presto://<user>:<password>@<host>:<port>/hive',
        connect_args={'protocol': 'https'})

This will automatically use HttpBasicAuth. Note that you must include the password in your connection string. Also this will automatically use SSL. requests package relies on certifi package. Keep those libs up to date.

If you need to use your own SSL Cert Verification, pass in the path to the cert like this

from sqlalchemy.engine import create_engine
engine = create_engine(
        'presto://<user>:<password>@<host>:<port>/hive',
        connect_args={'protocol': 'https', 
                                  'requests_kwargs': {'verify': '/path/to/certfile'}
 }
)

For Client Side Certificates

from sqlalchemy.engine import create_engine
engine = create_engine(
        'presto://<user>:<password>@<host>:<port>/hive',
        connect_args={'protocol': 'https', 
                                  'requests_kwargs': {'cert': ('/path/client.cert', '/path/client.key')}
 }
)

reference: http://docs.python-requests.org/en/master/user/advanced/

mtdeguzis commented 6 years ago

Hmm I wonder if this would work with Hive in the same manor

tooptoop4 commented 5 years ago

bump, can we connect to SSL (TLS1.2) protected HiveServer2?

safuente commented 4 years ago

I have the same issue, I need to pass the following properties (apart from user and password) that are usually configured in driver properties in a presto sql client:

SSL = true SSLTrustStorePassword = SSLTrustStorePath = socksProxy = localhost:8081

Is this possible in anyway using create engine function?

tjdodson commented 4 years ago

I have the same issue, I need to pass the following properties (apart from user and password) that are usually configured in driver properties in a presto sql client:

SSL = true SSLTrustStorePassword = SSLTrustStorePath = socksProxy = localhost:8081

Is this possible in anyway using create engine function?

@safuente

I have the same issue. Were you ever able to figure this out?

sheepdogapproved commented 3 years ago

I have the same issue, I need to pass the following properties (apart from user and password) that are usually configured in driver properties in a presto sql client: SSL = true SSLTrustStorePassword = SSLTrustStorePath = socksProxy = localhost:8081 Is this possible in anyway using create engine function?

@safuente

I have the same issue. Were you ever able to figure this out?

Same here, need to pass these same arguments. Bump?

rherasm commented 3 years ago

Hi, I'm fighting with the same type of problem, using HttpBasicAuth and forwarding my local port 8888 to the target host:port Presto DB, as:

engine_presto=create_engine( 'presto://user:pass@localhost:8888/hive', connect_args={ 'protocol': 'https',
'session_props': {'query_max_run_time': '1234m'} } )

this returns the error: SSLError(SSLCertVerificationError("hostname 'localhost' doesn't match '*.myproject.mycompany.com

Including my .pem certificate as part of the connect_args={'requests_kwargs': {'verify': ('D:/myPrivateKey.pem')}, ...} also returns: _OpenSSL.SSL.Error: [('x509 certificate routines', 'X509_load_cert_crlfile', 'no certificate or crl found')]

Please, do you know any specific property (args) for avoiding this problem? thanks for your comments. cheers.

SrinivasGuntupalli commented 2 years ago

@safuente

I have the same issue. Were you ever able to figure this out? if yes, can you share the code snippet?

rherasma commented 2 years ago

I solved that issue in this way, hope this helps, cheers.

Environmental variables required: SSL = true SOCKS_PROXY=localhost:8083 (local port mapped through the tunnel)

Engine through ODBC driver setup The first option defines a DB connection engine using SQLAlchemy through the ODBC driver.

This requires a LocalForward through .ssh/config file.

from sqlalchemy import create_engine

engine_presto_pro=create_engine( 'presto://user.name:password@localhost:8888/hive', connect_args={ 'protocol': 'https', 'requests_kwargs': {'verify': False}, 'session_props': {'query_max_run_time': '1234m'}, 'http_headers': {'X-Presto-Time-Zone': 'Europe/Paris' } } ) Optionally 'http_headers' parameter applies automatically the TimeZone transformation to timestamps data.

Engine through Socks5 proxy setup And the second option for the DB engine definition provides a connection engine using SQLAlchemy through socks5.

This options requires the enviroment variables SOCKS_PROXY and SSL defined.

from sqlalchemy import create_engine; from requests.auth import HTTPBasicAuth

engine_presto_pro=create_engine( 'presto://user.name:password@presto-standard-es.myproy.com:8888/hive', connect_args={ 'protocol': 'https', 'requests_kwargs': {'proxies': {'https':'socks5://localhost:8083'} } })

SrinivasGuntupalli commented 2 years ago

@rherasma in your above example where you are passing the below variables in the create_engine? and my issue is need to connect to presto by passing the below driver properties along with the username and password.

SSL = true SSLTrustStorePassword = SSLTrustStorePath =

rherasma commented 2 years ago

Sorry because I don't have all the code available here right now, I used it many months ago, but having a quick look I used this other dbapi option, hope this helps as well:

import prestodb

conn = prestodb.dbapi.connect(
    host=DBhost,
    port=DBport,
    user=DBuser,
    catalog=DBcatalog,
    schema=DBschema,
    http_scheme='https',
    auth=prestodb.auth.BasicAuthentication(DBuser, DBpass)
MasonMa-sy commented 2 years ago

@rherasma in your above example where you are passing the below variables in the create_engine? and my issue is need to connect to presto by passing the below driver properties along with the username and password.

SSL = true SSLTrustStorePassword = SSLTrustStorePath =

I have two solutions about this.

'verify': False

We can set verify to False, just like this

cursor = presto.connect(host='localhost', port=8443, username='test', source='ssl_test', protocol='https',
                        requests_kwargs={'verify': False, 'auth': HTTPBasicAuth('test', 'password')}).cursor()

The result is

anaconda3\lib\site-packages\urllib3\connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
anaconda3\lib\site-packages\urllib3\connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
anaconda3\lib\site-packages\urllib3\connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
anaconda3\lib\site-packages\urllib3\connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'localhost'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
[('ads',), ('default',), ('dev',), ('dim',), ('dwd',), ('dws',), ('flink',), ('information_schema',), ('ods',), ('test',), ('test_success',), ('tmp',)]

Process finished with exit code 0

We can get the correct result, but with some warnings.

use pem file

First we need to know that TrustStore or KeyStore is used by java. For python we use pem. We can thansform the TrustStore file to pem.

# truststore to p12
keytool -v -importkeystore -srckeystore presto-trust.jks -srcalias presto -destkeystore presto-trust.p12 -deststoretype PKCS12
# p12 to pem
openssl  pkcs12 -in presto-trust.p12 -out presto-trust.pem

Then the code is

cursor = presto.connect(host='localhost', port=8443, username='test', source='ssl_test', protocol='https',
                        requests_kwargs={'verify': r"C:\cert\presto-trust.pem", 'auth': HTTPBasicAuth('test', 'password')}).cursor()

The result has no warning.

[('ads',), ('default',), ('dev',), ('dim',), ('dwd',), ('dws',), ('flink',), ('information_schema',), ('ods',), ('test',), ('test_success',), ('tmp',)]

Process finished with exit code 0

It is worth noting that when we generate the keystore, we need to define the subjectAltName. Otherwise there will be another warning subjectAltName.

keytool \
 -keystore presto-keystore.jks  -storepass password -alias presto \
 -genkeypair -keyalg RSA -validity 1825  \
 -dname "CN=localhost,O=myorganization,OU=myou,L=mylocation,ST=California,C=US" \
 -ext "SAN=IP:10.100.0.1,IP:192.168.0.1,DNS:myserver.mydomain.com,DNS:localhost,EMAIL:name@mydomain.com"

One of IP, or DNS in "ext "SAN=" neet to be set to the unqualified hostname of the Presto coordinator.

iris-qq commented 1 year ago

@rherasma in your above example where you are passing the below variables in the create_engine? and my issue is need to connect to presto by passing the below driver properties along with the username and password.

SSL = true SSLTrustStorePassword = SSLTrustStorePath =

has any solution? i have the same question. @safuente @tjdodson @sheepdogapproved