catherinedevlin / ipython-sql

%%sql magic for IPython, hopefully evolving into full SQL client
MIT License
1.79k stars 369 forks source link

How to connect hive/presto with connect_args? #149

Open cqcn1991 opened 5 years ago

cqcn1991 commented 5 years ago

I can connect hive/presto using create_engine like this

from sqlalchemy.engine import create_engine
conn = create_engine(
    'presto://user@host:port',
    connect_args={'protocol': 'https',
                  'requests_kwargs': {
    'auth': HTTPBasicAuth('user', 'pw'),
    'verify': '/cert.pem',
    }}
)
df = pd.read_sql("SELECT * FROM dw.my_table", conn)

However, if I use ipython-sql, how should I connect?

From the doc, it seems only support string passed to create_engine, something like this

%%sql postgresql://will:longliveliz@localhost/shakes
select * from character
where abbrev = 'ALICE

How can I specify additional configs in connect_args?

ysixsix commented 5 years ago

Hi cqcn1991,

Using ipython-sql, you could achieve https connection as below:

  1. Load sql magic: %reload_ext sql %config SqlMagic.autocommit=False # for engines that do not support autommit

  2. Specify protocol as https at the end of connection string as below: %sql presto://user:pswd@host:port/hive?protocol=https

Hope this would work for you!

cqcn1991 commented 5 years ago

@ysixsix Hi, thanks for your reply. I'm still a bit confused. where should I put the connect_args? There're some other params besides https protocol

image

ysixsix commented 5 years ago

@cqcn1991 Hi, I was specifically looking for using https as protocol, and this is achieved by setting the protocol to https as below as param:

%sql presto://user:pswd@host:port/hive?protocol=https

If you are also looking for setting other connect_args, you could try using similar approach, e.g

%sql presto://user:pswd@host:port/hive?protocol=https&auth=HTTPBasicAuth('user', 'pw')&verify=/cert.pem

I haven't tried setting the auth and verify params though. Hope it may help.

l4z41 commented 4 years ago

The verify param cannot be handled: StatementError: (builtins.TypeError) __init__() got an unexpected keyword argument 'verify'

michalrudko commented 4 years ago

I have the same case - need to pass some Kerberos-specific args in my URL, but it seems there is no easy way to handle this. Have anyone found some solution or workaround to this topic? My args and a SQL Alchemy create_engine call (which works):

args = {'protocol': 'https',\
          'KerberosRemoteServiceName': os.environ['KERBEROS_REMOTES_SERVICE_NAME'],\
          'KerberosConfigPath':os.environ['KERBEROS_CONFIG_PATH'],\
          'KerberosPrincipal': principal,\
          'KerberosCredentialCachePath': f'/tmp/krb5cc_{UID}',\
          'requests_kwargs': {'verify': False}\
        }
engine = create_engine(presto_url,connect_args=args) 

and my attempt with %sql magics URL:

%sql presto://host:port?protocol=https&KerberosRemoteServiceName=presto&KerberosConfigPath=/etc/krb5.conf&KerberosPrincipal=myKerberosPrincipal&KerberosCredentialCachePath=/tmp/krb5cc_myUID&requests_kwargs.verify=False

the error message I am getting is:

StatementError: (builtins.ValueError) Protocol must be https when passing a password
[SQL: show tables;]

Which indicates that it has not processed the args correctly...

kacecode commented 4 years ago

My coworker and I had similar needs. We've provided a PR supporting this use-case. https://github.com/catherinedevlin/ipython-sql/pull/152