cloudera / impyla

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Apache License 2.0
727 stars 249 forks source link

impyla should gracefully(?) use the correct auth mech between Impala and Hive #87

Open laserson opened 9 years ago

laserson commented 9 years ago

Follow on to @lskuff's comment in #78.

Note - a1053ce fixes Impyla so functionally queries can run against Hive and Impala. There is a follow on change that's needed to allow Impyla to connect to HS2 for all security modes (set via the 'hive.server2.authentication' configuration. Specifically impyla will now work with unsecure (NOSASL) connections and should also work with Kerberized connections (SASL GSSAPI mechanism). HiveServer2's default transport is actually PLAIN SASL, which does not currently work. This will be addressed in a follow on change.

laserson commented 9 years ago

cc @szehon if you're gonna knock this out.

laserson commented 9 years ago

More notes from Lenni

So I found the problem with PLAIN sasl. It was because the username and password must be set and non-empty or the connection fails, even though they are not really used. When you enable LDAP it will force impyla to use PLAIN SASL, but will doesn't actually use LDAP anywhere for authentication.This allows the connection to succeed against Hive.

For example, this now works to connect to Hive using the PLAIN SASL: conn = connect(host='vd0214.halxg.cloudera.com', port=10000, use_ldap=True, ldap_user='user', ldap_password='pass')

szehon commented 9 years ago

Thanks, @lskuff mentioned he is also interested in taking a look.

lskuff commented 9 years ago

Patch submitted here: https://github.com/cloudera/impyla/pull/93

lskuff commented 9 years ago

It is hard to automatically detect whether the server is Hive or Impala because you need to connect before calling any RPCs that might help.