Open gseva opened 6 years ago
We are periodically getting this same error. We haven't found a solution. Some ideas are hiveserver2 is set to http protocol instead of binary, the server is for some reason severing the connection...?
@mauza We had to rollback to an earlier version of pyhive. These are the versions we're using: thrift==0.9.3 thrift_sasl==0.2.1 pyhive[hive]==0.2.1
Thanks for sharing those versions. Looks like there might be some breaking changes in the version of pyhive we were using, but I'll work through those tomorrow. Maybe I should start a new thread because our problem is fairly intermittent and doesn't require a long running hive insert query...
Same issue here. Long sql statements simply freeze up for me and I eventually get that timeout. I'm running under docker with debian:stretch.
Same issue here. after about 10 minutes running.
I'm also struggling with same issue.
Same issue here
Is there any update on this? Getting this issue.
Any updated?
Same issue
Same issue. Hive 2.3.3.
I change my hive's port from 22 to 10000, it works, maybe a help to you.
@mauza We had to rollback to an earlier version of pyhive. These are the versions we're using:
thrift==0.9.3 thrift_sasl==0.2.1 pyhive[hive]==0.2.1
Thanks @gseva. that saved my day.
I encountered the same problem when i use 'auth=NOSASL'. Then I changed to 'auto=NONE' and I encountered another problem: 'TSaslClientTransport' object has no attribute 'readAll'. The later one is because the default installed thrift_sasl (0.2.1) is not compatibal with python3. So after upgrading it, the problem resolved.
final configs: python: 3.6 pyhive: 0.6.2 thrift: 0.13.0 thrift_sasl: 0.4.2
Was this problem fixed in latest version?
Here's a way to prevent connection reset/dropped on long running queries -
(solved) Hive - connection dropped before job is done in Hadoop
https://github.com/dropbox/PyHive/issues/358
https://github.com/dropbox/PyHive/tree/v0.6.2#db-api-asynchronous Doing an asynchronous sql query prevents the connection from getting dropped in sql queries that take long.
fyi - if using an ORM (like peewee / sqlalchemy), then get the cursor from the "database" object. example in peewee: database.get_cursor()
I encountered the same problem when i use 'auth=NOSASL'. Then I changed to 'auto=NONE' and I encountered another problem: 'TSaslClientTransport' object has no attribute 'readAll'. The later one is because the default installed thrift_sasl (0.2.1) is not compatibal with python3. So after upgrading it, the problem resolved.
final configs: python: 3.6 pyhive: 0.6.2 thrift: 0.13.0 thrift_sasl: 0.4.2
Also worked on python: 3.8
I'm running a long-ish insert query in Hive using PyHive 0.6.1 and it fails with
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
after about 5 minutes running. On the server side the query keeps running until finishing successfully. I don't have this problem with fast queries.The environment in which this happens is a Docker container based on
python:3.6-slim
. Among other things, i'm installinglibsasl2-dev
andlibsasl2-modules
packages, andpyhive[hive]
python package. I can't reproduce it locally on my Mac with the same python version: the code correctly waits untill the query finishes.Any clue why this is happening? Thanks in advance.
The code i'm using is:
This is the full traceback