Querying hive, I get this error with even a trivial sized table:
Error in sasl_decode (-1) SASL(-1): generic failure: Unable to find a callback: 32775
I'm using a hive version of: Hive 2.3.4-amzn-0. As noted from the name, it's Amazon EMR.
If I put on 'LIMIT 100' it is fine. But asking for more than about 300 errors gives the error consistently.
I can reproduce it with the TPC-H benchmark tables, which are simple strings, integers, and floats, and not very wide.
I can reproduce it with the table being internal or external or a view.
If I remove the as_pandas and even try to get one row with fetchone() directly on the cursor, that fails immediately, which I thought was interesting. But again, if I do 'LIMIT 100', the fetchone() works fine too.
My code is this:
from impala.dbapi import connect
from impala.util import as_pandas
...
conn = connect(
host=os.environ['jamHiveServer'],
port=10000,
auth_mechanism='GSSAPI',
kerberos_service_name='hive'
)
cursor = conn.cursor()
cursor.execute(sQuery)
df = as_pandas(cursor) # Crashes on this line
The full error is this:
Traceback (most recent call last):
File "/export/home/jm/Linux/see/see.py", line 159, in <module>
objMain.main()
File "/export/home/jm/Linux/see/see.py", line 98, in main
df = factory.getDf(self.theArgs)
File "/tech/home/jm/Linux/see/dfGetterFactory.py", line 23, in getDf
return getter.getDf(args)
File "/tech/home/jm/Linux/see/hive.py", line 39, in getDf
df = as_pandas(cursor)
File "/export/home/jm/.local/lib/python3.6/site-packages/impala/util.py", line 63, in as_pandas
return DataFrame.from_records(cursor.fetchall(), columns=names,
File "/export/home/jm/.local/lib/python3.6/site-packages/impala/hiveserver2.py", line 536, in fetchall
return list(self)
File "/export/home/jm/.local/lib/python3.6/site-packages/impala/hiveserver2.py", line 584, in __next__
convert_types=self.convert_types)
File "/export/home/jm/.local/lib/python3.6/site-packages/impala/hiveserver2.py", line 1266, in fetch
resp = self._rpc('FetchResults', req)
File "/export/home/jm/.local/lib/python3.6/site-packages/impala/hiveserver2.py", line 993, in _rpc
response = self._execute(func_name, request)
File "/export/home/jm/.local/lib/python3.6/site-packages/impala/hiveserver2.py", line 1010, in _execute
return func(request)
File "/usr/app/anaconda_3/lib/python3.6/site-packages/thriftpy2/thrift.py", line 219, in _req
return self._recv(_api)
File "/usr/app/anaconda_3/lib/python3.6/site-packages/thriftpy2/thrift.py", line 231, in _recv
fname, mtype, rseqid = self._iprot.read_message_begin()
File "/usr/app/anaconda_3/lib/python3.6/site-packages/thriftpy2/protocol/binary.py", line 373, in read_message_begin
self.trans, strict=self.strict_read)
File "/usr/app/anaconda_3/lib/python3.6/site-packages/thriftpy2/protocol/binary.py", line 165, in read_message_begin
sz = unpack_i32(inbuf.read(4))
File "/usr/app/anaconda_3/lib/python3.6/site-packages/thrift_sasl/__init__.py", line 173, in read
self._read_frame()
File "/usr/app/anaconda_3/lib/python3.6/site-packages/thrift_sasl/__init__.py", line 187, in _read_frame
message=self.sasl.getError())
thrift.transport.TTransport.TTransportException: b'Error in sasl_decode (-1) SASL(-1): generic failure: Unable to find a callback: 32775'
Querying hive, I get this error with even a trivial sized table:
I'm using a hive version of: Hive 2.3.4-amzn-0. As noted from the name, it's Amazon EMR.
If I put on 'LIMIT 100' it is fine. But asking for more than about 300 errors gives the error consistently.
I can reproduce it with the TPC-H benchmark tables, which are simple strings, integers, and floats, and not very wide.
I can reproduce it with the table being internal or external or a view.
If I remove the as_pandas and even try to get one row with fetchone() directly on the cursor, that fails immediately, which I thought was interesting. But again, if I do 'LIMIT 100', the fetchone() works fine too.
My code is this:
The full error is this:
Pip freeze is: