cloudera / impyla

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Apache License 2.0
728 stars 248 forks source link

How to connect to a kerberos cluster from external network #199

Open robinisme2 opened 8 years ago

robinisme2 commented 8 years ago

Hi everyone,

I set up a kerberos cluster by cloudera manager 5.7.0 and it works fine. following picture is the architecture of my cluster. kerberos cluster architecture

However, when I try to connect to my cluster by impyla API to do some query from external network,

connect( host='10.36.174.38', port=21050, auth_mechanism='GSSAPI', kerberos_service_name='impala'),

It fails, and the error is

thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server impala/10.36.174.38@ROBINISME2.COM not found in Kerberos database)

as the error mentioned, my cluster doesn't have principal "impala/10.36.174.38@ROBINISME2.COM". I have "impala/dn-3-1@ROBINISME2.COM" and "impala/dn-3-2@ROBINISME2.COM", but I can't connect to hosts dn-3-1 & dn-3-2 directly

To connect to a datanode with impala daemon, I setup HAproxy on my proxy server,

listen impala :21050 mode tcp option tcplog balance leastconn server dn-3-1 dn-3-1:21050 server dn-3-2 dn-3-2:21050

and also follow all instructions in this page http://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_proxy.html#proxy_kerberos

but the error remains, what can I do to reach my goal? Is it possible to connect to kerberos cluster by impyla API and do some query from external network?

lordjc commented 8 years ago

At first glance this appears to be a DNS issue. Kerberos actually does a forward and reverse lookup on the host.

You can check reverse lookup with dig -x

On Monday, June 6, 2016, Robin.Chien notifications@github.com wrote:

Hi everyone,

I set up a kerberos cluster by cloudera manager 5.7.0 and it works fine. following picture is the architecture of my cluster. kerberos cluster architecture https://drive.google.com/file/d/0B0pjVE_bxylxem4wOEdTZzBGQTQ/view?usp=sharing

However, when I try to connect to my cluster by impyla API to do some query from external network,

_connect( host='10.36.174.38', port=21050, auth_mechanism='GSSAPI', kerberos_servicename='impala'),

It fails, and the error is

_thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server impala/10.36.174.38@ROBINISME2.COM javascript:_e(%7B%7D,'cvml','10.36.174.38@ROBINISME2.COM'); not found in Kerberos database)_

as the error mentioned, my cluster doesn't have principal "impala/ 10.36.174.38@ROBINISME2.COM javascript:_e(%7B%7D,'cvml','10.36.174.38@ROBINISME2.COM');". I have "impala/dn-3-1@ROBINISME2.COM javascript:_e(%7B%7D,'cvml','dn-3-1@ROBINISME2.COM');" and "impala/ dn-3-2@ROBINISME2.COM javascript:_e(%7B%7D,'cvml','dn-3-2@ROBINISME2.COM');", but I can't connect to hosts dn-3-1 & dn-3-2 directly

To connect to a datanode with impala daemon, I setup HAproxy on my proxy server,

**listen impala :21050 mode tcp option tcplog balance leastconn

server dn-3-1 dn-3-1:21050 server dn-3-2 dn-3-2:21050**

and also follow all instructions in this page

http://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_proxy.html#proxy_kerberos

but the error remains, what can I do to reach my goal? Is it possible to connect to kerberos cluster and do some query from external network?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cloudera/impyla/issues/199, or mute the thread https://github.com/notifications/unsubscribe/ACxb0ymGnFBi7I3uPNHHJ2G95AaDgfhqks5qJQ7hgaJpZM4IvmXW .

derwin12 commented 7 years ago

I am facing this same issue.. [update] found the issue shortly after. Ran

#klist
show my ticket was hive/[fullhostname]@DOMAIN
Then updated my connect string to:
conn = connect(host='[fullhostname]', port=10000, auth_mechanism='GSSAPI',kerberos_service_name='hive')
*but* My host doesnt know what fullhostname is .. so I added
[ip]  [fullhostname]
to /etc/hosts
ufukomer commented 6 years ago

I can run Kerberos enabled Impala from CDH using impala-shell but cannot connect through impyla:

connect(host='127.0.0.1', port=21050, auth_mechanism='GSSAPI', kerberos_service_name='impala')

Any update here?

xfs1010 commented 5 years ago

I am facing this same issue.. [update] found the issue shortly after. Ran

#klist
show my ticket was hive/[fullhostname]@DOMAIN
Then updated my connect string to:
conn = connect(host='[fullhostname]', port=10000, auth_mechanism='GSSAPI',kerberos_service_name='hive')
*but* My host doesnt know what fullhostname is .. so I added
[ip]  [fullhostname]
to /etc/hosts

change the host to bigdata1.DEMO.COM works for me bigdata1.DEMO.COM is my hive cluster address in the hosts file