Open waldenrt opened 4 years ago
from impala.dbapi import connect
from impala.util import as_pandas
import sqlalchemy
conn = connect(host='impalaprd.tenethealth.net',
port=21050,
use_ssl=True,
auth_mechanism='GSSAPI',
kerberos_service_name='impala')
cursor = conn.cursor()
engine = sqlalchemy.create_engine('impala://', creator=conn)
sql = 'SELECT DBID from ACEDTA.DBINFO LIMIT 1000'
cursor.execute(sql)
mypanda = as_pandas(cursor)
print(mypanda)
mypanda.to_sql('default.dbinfo_dataframe_bulkload', engine)
It' seems you issue is is about connection. In which line the error happens?
I am having the same problem
seems like the function expects a connection factory, instead of a single Hive connection.
/home/cdsw/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py in __connect(self, first_connect_check) 654 try: 655 self.starttime = time.time() --> 656 connection = pool._invoke_creator(self) 657 pool.logger.debug("Created new connection %r", connection) 658 self.connection = connection
/home/cdsw/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py in
TypeError: 'HiveServer2Connection' object is not callable
@rafaelreuber can you tell me what is your sqlalchemy version ?
I managed to make it work, but using the connection url :
engine = sqlalchemy.create_engine('impala://host:port/database?use_ssl=True&auth_mechanism=GSSAPI')
And then assigning String type to a TEXT field using dtype.
from sqlalchemy.types import String
mypanda.to_sql('poc_dashboard_test', engine, if_exists='append', index=False, dtype={'name':String()})
I have the following code:
from impala.dbapi import connect from impala.util import as_pandas import sys import sqlalchemy conn = connect(host='impalaprd.tenethealth.net', port=21050, use_ssl=True, auth_mechanism='GSSAPI', kerberos_service_name='impala') cursor = conn.cursor() engine = sqlalchemy.create_engine('impala://', creator=conn) sql = 'SELECT DBID from ACEDTA.DBINFO LIMIT 1000' cursor.execute(sql) mypanda = as_pandas(cursor) print(mypanda) mypanda.to_sql('default.dbinfo_dataframe_bulkload',engine)
When I call panda.to_sql, I get the following error:
'HiveServer2Connection' object is not callable
Am I missing a package installed in my environment or something with the code?
Thanks,
Richard