dropbox / PyHive

Python interface to Hive and Presto. 🐝
Other
1.67k stars 550 forks source link

Kerberos (hive/presto) access documentation #174

Open parisni opened 6 years ago

parisni commented 6 years ago

Hi

Appently ( #47 #91 ) kerberized access is available. However there is no example on how to use it in the documention.

That would be more than helpfull

Thanks

ExpressGit commented 6 years ago

Hi:

 where we can access the documention ,we need it

Thanks

Dubrzr commented 6 years ago

Looking at https://github.com/dropbox/PyHive/blob/master/pyhive/hive.py, here is how:

sudo apt-get install libsasl2-dev libsasl2-2 libsasl2-modules-gssapi-mit
pip install Pyhive sasl thrift_sasl 

Be sure to have a kerberos configured in /etc/krb5.conf. kinit with your keytab.

With pyhive :

from pyhive import hive

engine = hive.Connection(host="<hive-host>", port=<hive-port>, username="<kerberos-username>", database='<db-name>', auth='KERBEROS', kerberos_service_name="hive")

With sqlalchemy :

from sqlalchemy import *

engine = create_engine("hive://<kerberos-username>@<hive-host>:<hive-port>/<db-name>",connect_args={'auth': 'KERBEROS','kerberos_service_name': 'hive'})
hellofuturecyj commented 5 years ago

if keytab must be kinited before connection,that means i have to run pyhive code on hadoop cluster,right?

Dubrzr commented 5 years ago

Nope, you can kinit from a remote computer (through ports 88 tcp+udp) and then do remote pyhive. We do just that.

samarth-goel-guavus commented 5 years ago

Nope, you can kinit from a remote computer (through ports 88 tcp+udp) and then do remote pyhive. We do just that.

@Dubrzr : Possible to give an example or point to any such link where you've done this?

Dubrzr commented 5 years ago

@samarth-goel-guavus

  1. Install kerberos on your own computer
  2. Setup kerberos on your computer so that it connects to the remote kerberos server (/etc/krb5.conf)
  3. Given a keytab file (provided by your kerberos administrator), you can authenticate your computer to the remote kerberos server using kinit -kt your.keytab username@YOUR_KERBEROS_REALM
  4. You can check that you have a valid kerberos ticket using klist
  5. You can now launch pyhive with kerberos.
dmueller1607 commented 4 years ago

Is there also a solution for Windows? The above example Hive connection seems not to work on my Windows client (kinit works fine for years on my pc). Hive server log sais: java.lang.RuntimeException: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream

abiwill commented 4 years ago

@samarth-goel-guavus

  1. Install kerberos on your own computer
  2. Setup kerberos on your computer so that it connects to the remote kerberos server (/etc/krb5.conf)
  3. Given a keytab file (provided by your kerberos administrator), you can authenticate your computer to the remote kerberos server using kinit -kt your.keytab username@YOUR_KERBEROS_REALM
  4. You can check that you have a valid kerberos ticket using klist
  5. You can now launch pyhive with kerberos.

How can we implement this in a docker setup. How will a user add hive data source using the superset UI in this case.

bkyryliuk commented 4 years ago

@Dubrzr thanks for providing examples here. it would be nice if you could add this information to the readme.

lsgrep commented 3 years ago

@samarth-goel-guavus

  1. Install kerberos on your own computer
  2. Setup kerberos on your computer so that it connects to the remote kerberos server (/etc/krb5.conf)
  3. Given a keytab file (provided by your kerberos administrator), you can authenticate your computer to the remote kerberos server using kinit -kt your.keytab username@YOUR_KERBEROS_REALM
  4. You can check that you have a valid kerberos ticket using klist
  5. You can now launch pyhive with kerberos.

I did all of this, but it did not work. However, creating a cache file & setting the KRB5CCNAME env variable did the trick for me.

# you have to run this
cmd=f'kinit -kt {keytab_file} -c {ccache_file} {principal}'
...
os.environ['KRB5CCNAME'] = ccache_file
ghost commented 3 years ago

@lsgrep Can you provide the full code snippet. It would be very helpful

sushma1918 commented 1 year ago

@samarth-goel-guavus

  1. Install kerberos on your own computer
  2. Setup kerberos on your computer so that it connects to the remote kerberos server (/etc/krb5.conf)
  3. Given a keytab file (provided by your kerberos administrator), you can authenticate your computer to the remote kerberos server using kinit -kt your.keytab username@YOUR_KERBEROS_REALM
  4. You can check that you have a valid kerberos ticket using klist
  5. You can now launch pyhive with kerberos.

How can we implement this in a docker setup. How will a user add hive data source using the superset UI in this case.

@samarth-goel-guavus

  1. Install kerberos on your own computer
  2. Setup kerberos on your computer so that it connects to the remote kerberos server (/etc/krb5.conf)
  3. Given a keytab file (provided by your kerberos administrator), you can authenticate your computer to the remote kerberos server using kinit -kt your.keytab username@YOUR_KERBEROS_REALM
  4. You can check that you have a valid kerberos ticket using klist
  5. You can now launch pyhive with kerberos.

i am using all theses steps but getting this error : raise TTransportException(type=TTransportException.NOT_OPEN, thrift.transport.TTransport.TTransportException: Bad status: 3 (b'GSS initiate failed')