Open rav009 opened 5 years ago
I have some clues. Seems like you cannot use 'select *' for a table with nested type columns(e.g. map type), otherwise you will get this error.
Can you include the output of "DESCRIBE store_dw_des.dw_loc_sku_day_actual_e_business"?
Same error different version:
Something strange is happening in my case, everything works when I set the LIMIT to 138, when I change it to 139 I get the same error as rav009. I have the exact same behavior when I change the query, it still fails with LIMIT set to equal to 139. Setting the LIMIT between 139 and roughly 500 yields the same error, 500 and up return another error, ( OverflowError: Python int too large to convert to C long) which I pasted below for reference
Any ideas on what is causing this?
Many thanks!
env: python3.6.2 thrift 0.13.0 thrift-sasl 0.4a1 thriftpy2 0.4.10 impyla 0.16.2
My code:
import pandas as pd from impala.dbapi import connect from impala.util import as_pandas
conn_inter = connect(host=DRONA_IMPALA_HOST, port=DRONA_IMPALA_PORT, use_ssl=True, ca_cert=None, auth_mechanism='PLAIN', user=IMPALA_USER, password=IMPALA_PASSWORD, ) cursor = conn_inter.cursor()
table = 'sw_os.min_data_kudu'
cursor.execute('SELECT * FROM table LIMIT 139') data = as_pandas(cursor)
TypeError Traceback (most recent call last)
@EdTheEagle can you share the version of impala (i.e. output of select version()) and the output of "describe table" (if not the column names, at least the types).
@timarmstrong i have the same problem. impala = 2.12.0
id - double crt_mnemo - string
from impala.dbapi import connect
from impala.util import as_pandas
import pandas as pd
impala_conn = connect(host='hostname', port=21050, auth_mechanism='GSSAPI', timeout=100000, use_ssl=True, ca_cert=None, ldap_user=None, ldap_password=None, kerberos_service_name='impala')
df = pd.read_sql("select id, crt_mnemo from demo_db.stg_deals_opn LIMIT 1000", impala_conn)
print(df)
@EdTheEagle did you solve this problem?
@MacJei
I did not solve the problem. I work in a company where there might be a download limit set on these kind of calls.
I am using the pyodbc package which works for me and I did not investigate further.
Good luck!
We appear to have the same problem (SSL: "OverflowError: Python int too large to convert to C long").
Trying to connect from Python 3.6 (tried from Windows 10 and from Red Hat Linux) to Cloudera Impala on a kerberized Oracle Big Data Appliance.
Our code is pretty much the same as the code example given from the OP, only without user/password as we're using ticketing/winkerberos. Like in the OP, we don't have this error if we limit the size of the result (by querying small tables or using LIMIT
), but do if the result is anything more than a few KB or so.
I am happy to provide more details if needed.
Hello I also have the same issue.
Hopefully this helps someone: I had pretty much the same issue, with 2 different errors depending on my LIMIT.
'TypeError: bytes expected'
'OverflowError: signed integer is greater than maximum'.
What seems to have solved it was setting the buffersize + thrift request size to conservative values, as they seem to default to something that can overflow.
cursor.set_arraysize(10)
cursor.execute("set batch_size=10")
I was able to retrieve 10k rows like this, whereas before it was crashing at 60.
I got the idea from https://issues.apache.org/jira/browse/IMPALA-1618
@aconstantin2 After digging for hours, your solution worked! Thank you!
env: python3.5.1 thrift 0.11.0 thrift-sasl 0.3.0 thriftpy 0.3.9 impyla 0.14.2.2
my code: from impala.dbapi import connect from impala.util import as_pandas icon=connect(host='bd-slave07-pe2.f.com',port=21050,user='username',auth_mechanism='GSSAPI', password='psd') cs = icon.cursor(); cs.execute('select * from table limit 100') df = as_pandas(cs)
error msg: /opt/python3.5/lib/python3.5/site-packages/impala/hiveserver2.py in init(self, trowset, schema, convert_types) 853 854 is_null = bitarray(endian='little') --> 855 is_null.frombytes(nulls) 856 857 # Ref HUE-2722, HiveServer2 sometimes does not add trailing '\x00'
TypeError: byte string expected