cloudera / impyla

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Apache License 2.0
730 stars 248 forks source link

Issue writing Panda Dataframe back to Impala #379

Open waldenrt opened 4 years ago

waldenrt commented 4 years ago

I have the following code:

from impala.dbapi import connect from impala.util import as_pandas import sys import sqlalchemy conn = connect(host='impalaprd.tenethealth.net', port=21050, use_ssl=True, auth_mechanism='GSSAPI', kerberos_service_name='impala') cursor = conn.cursor() engine = sqlalchemy.create_engine('impala://', creator=conn) sql = 'SELECT DBID from ACEDTA.DBINFO LIMIT 1000' cursor.execute(sql) mypanda = as_pandas(cursor) print(mypanda) mypanda.to_sql('default.dbinfo_dataframe_bulkload',engine)

When I call panda.to_sql, I get the following error:

'HiveServer2Connection' object is not callable

Am I missing a package installed in my environment or something with the code?

Thanks,

Richard

rafaelreuber commented 4 years ago
from impala.dbapi import connect
from impala.util import as_pandas
import sqlalchemy

conn = connect(host='impalaprd.tenethealth.net',
    port=21050,
    use_ssl=True,
    auth_mechanism='GSSAPI',
    kerberos_service_name='impala')

cursor = conn.cursor()
engine = sqlalchemy.create_engine('impala://', creator=conn)
sql = 'SELECT DBID from ACEDTA.DBINFO LIMIT 1000'
cursor.execute(sql)

mypanda = as_pandas(cursor)
print(mypanda)

mypanda.to_sql('default.dbinfo_dataframe_bulkload', engine)

It' seems you issue is is about connection. In which line the error happens?

miagao commented 4 years ago

I am having the same problem

seems like the function expects a connection factory, instead of a single Hive connection.

/home/cdsw/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py in __connect(self, first_connect_check) 654 try: 655 self.starttime = time.time() --> 656 connection = pool._invoke_creator(self) 657 pool.logger.debug("Created new connection %r", connection) 658 self.connection = connection

/home/cdsw/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py in (crec) 247 argspec = util.get_callable_argspec(self._creator, no_self=True) 248 except TypeError: --> 249 return lambda crec: creator() 250 251 defaulted = argspec[3] is not None and len(argspec[3]) or 0

TypeError: 'HiveServer2Connection' object is not callable

miagao commented 4 years ago

@rafaelreuber can you tell me what is your sqlalchemy version ?

miagao commented 4 years ago

I managed to make it work, but using the connection url :

engine = sqlalchemy.create_engine('impala://host:port/database?use_ssl=True&auth_mechanism=GSSAPI')

And then assigning String type to a TEXT field using dtype.

from sqlalchemy.types import String

mypanda.to_sql('poc_dashboard_test', engine, if_exists='append', index=False, dtype={'name':String()})