cloudera / impyla

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Apache License 2.0
727 stars 249 forks source link

impyla with LDAP auth encounter a hang during executing #392

Open dacoolbaby opened 4 years ago

dacoolbaby commented 4 years ago

We are using impyla v0.16.1 on both windows and linux to connect Impala with LDAP auth. Noticed that impyla conmponent with LDAP auth required thrift_sasl module which we consider something bad happens here.

Running execution is fine without LDAP auth, and encounter a hang utill timeout with LDAP auth.

Finally, we try pyhive instead and all the selection with or without LDAP return normal.

pls have a look.

timarmstrong commented 4 years ago

@dacoolbaby there's probably not enough information here for us to reproduce the problem. Can you report the version of Impala that you're running, the OS versions, and an example of how you're connecting. If you can get a backtrace of where the python thread is hanging too that would be helpful.

dacoolbaby commented 4 years ago

More details as follow:

Impala version is : CDH-6.3.0-1.cdh6.3.0.p0.1279813 Linux version 3.10.0-1062.4.1.el7.x86_64 CentOS Linux release 7.7.1908

running jdbc on java driver is ok. And now the pyhive also get hang problem too.

Python Side traceback as follow, process was stop by timeout.

Traceback (most recent call last):   File "C:/Users/daxin/PycharmProjects/Demo/impala_demo/hive_ldap_demo2.py", line 19, in <module>     results = cursor.fetchall()   File "C:\Users\daxin\AppData\Local\Programs\Python\Python37\lib\site-packages\impala\hiveserver2.py", line 535, in fetchall     return list(self)   File "C:\Users\daxin\AppData\Local\Programs\Python\Python37\lib\site-packages\impala\hiveserver2.py", line 583, in next     convert_types=self.convert_types)   File "C:\Users\daxin\AppData\Local\Programs\Python\Python37\lib\site-packages\impala\hiveserver2.py", line 1242, in fetch     resp = self._rpc('FetchResults', req)   File "C:\Users\daxin\AppData\Local\Programs\Python\Python37\lib\site-packages\impala\hiveserver2.py", line 994, in _rpc     err_if_rpc_not_ok(response)   File "C:\Users\daxin\AppData\Local\Programs\Python\Python37\lib\site-packages\impala\hiveserver2.py", line 748, in err_if_rpc_not_ok     raise HiveServer2Error(resp.status.errorMessage) impala.error.HiveServer2Error: Invalid query handle: 604c103afc754c09:3e40f2d100000000

Thansk!!

 

------------------ 原始邮件 ------------------ 发件人: "Tim Armstrong"<notifications@github.com>; 发送时间: 2020年3月26日(星期四) 上午10:46 收件人: "cloudera/impyla"<impyla@noreply.github.com>; 抄送: "星宇-Sam-"<daxingyu@qq.com>;"Mention"<mention@noreply.github.com>; 主题: Re: [cloudera/impyla] impyla with LDAP auth encounter a hang during executing (#392)

@dacoolbaby there's probably not enough information here for us to reproduce the problem. Can you report the version of Impala that you're running, the OS versions, and an example of how you're connecting. If you can get a backtrace of where the python thread is hanging too that would be helpful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

dacoolbaby commented 4 years ago

Now the hanging is only exists on windows. We will do a lot more test to confirm.

------------------ 原始邮件 ------------------ 发件人: "星宇-Sam-"<daxingyu@qq.com>; 发送时间: 2020年3月26日(星期四) 中午11:02 收件人: "cloudera/impyla"<reply@reply.github.com>;"cloudera/impyla"<impyla@noreply.github.com>; 抄送: "Mention"<mention@noreply.github.com>; 主题: 回复: [cloudera/impyla] impyla with LDAP auth encounter a hang during executing (#392)

More details as follow:

Impala version is : CDH-6.3.0-1.cdh6.3.0.p0.1279813 Linux version 3.10.0-1062.4.1.el7.x86_64 CentOS Linux release 7.7.1908

running jdbc on java driver is ok. And now the pyhive also get hang problem too.

Python Side traceback as follow, process was stop by timeout.

Traceback (most recent call last):   File "C:/Users/daxin/PycharmProjects/Demo/impala_demo/hive_ldap_demo2.py", line 19, in <module>     results = cursor.fetchall()   File "C:\Users\daxin\AppData\Local\Programs\Python\Python37\lib\site-packages\impala\hiveserver2.py", line 535, in fetchall     return list(self)   File "C:\Users\daxin\AppData\Local\Programs\Python\Python37\lib\site-packages\impala\hiveserver2.py", line 583, in next     convert_types=self.convert_types)   File "C:\Users\daxin\AppData\Local\Programs\Python\Python37\lib\site-packages\impala\hiveserver2.py", line 1242, in fetch     resp = self._rpc('FetchResults', req)   File "C:\Users\daxin\AppData\Local\Programs\Python\Python37\lib\site-packages\impala\hiveserver2.py", line 994, in _rpc     err_if_rpc_not_ok(response)   File "C:\Users\daxin\AppData\Local\Programs\Python\Python37\lib\site-packages\impala\hiveserver2.py", line 748, in err_if_rpc_not_ok     raise HiveServer2Error(resp.status.errorMessage) impala.error.HiveServer2Error: Invalid query handle: 604c103afc754c09:3e40f2d100000000

Thansk!!

 

------------------ 原始邮件 ------------------ 发件人: "Tim Armstrong"<notifications@github.com>; 发送时间: 2020年3月26日(星期四) 上午10:46 收件人: "cloudera/impyla"<impyla@noreply.github.com>; 抄送: "星宇-Sam-"<daxingyu@qq.com>;"Mention"<mention@noreply.github.com>; 主题: Re: [cloudera/impyla] impyla with LDAP auth encounter a hang during executing (#392)

@dacoolbaby there's probably not enough information here for us to reproduce the problem. Can you report the version of Impala that you're running, the OS versions, and an example of how you're connecting. If you can get a backtrace of where the python thread is hanging too that would be helpful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

dacoolbaby commented 4 years ago

Ok,we solve it by using selection without using "" . We found that the query sql using will causing the execution hang with LDAP auth. But I think it is still a bug.

dacoolbaby commented 4 years ago

We still encounter hang at the production env. When the sql running selection on table column ,it just hang. But if you run count query ,return ok.

dacoolbaby commented 4 years ago

we finally solved the problem by replacing the dependencies module as follow: thriftpy 0.3.9 thriftpy2 0.4.10 thrift 0.9.3 thrift-sasl 0.2.1 sasl 0.2.1 thrift-sasl 0.2.1 impyla 0.16.2

noklam commented 3 years ago

@dacoolbaby Did you figure out what was the issue? I am getting the same error on Window.

I Tried to install the library version you mention and got this error.

I am connecting with connect(host='my_host,port=21050,timeout=20,user=user, password=password, auth_mechanism='PLAIN')

    message=("Could not start SASL: %s" % self.sasl.getError()))
thriftpy.transport.TTransportException: TTransportException(type=1, message="Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'")
anth0ny-x commented 3 years ago

My query was hanging with both auth_mechanism='LDAP' and auth_mechanism='PLAIN', Solved it by downgrading to thrift-sasl 0.4.2 -> 0.2.1 after seeing dacoolbaby's comment

noklam commented 3 years ago

@anth0ny-x can u share the library version like pip list?

it seems only a specific version can work.

are u able to connect stably ? As I can connect but it will often hangs after few query.

anth0ny-x commented 3 years ago

I'm running Ubuntu 18.04.4 LTS and my pip list impyla (0.16.2) sasl (0.2.1) thrift (0.13.0) thrift-sasl (0.2.1) thriftpy (0.3.9) thriftpy2 (0.4.11)

hellomikelo commented 3 years ago

Downgrading thrift-sasl 0.4.2 --> 0.2.1 worked for me.