cloudera / impyla

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Apache License 2.0
727 stars 249 forks source link

Impala with mysql backend failing on creating table #137

Closed catchthemonster closed 8 years ago

catchthemonster commented 8 years ago

it looks like pandas via DBI2 is failing on creating a table. It looks also that def has_table(self, name, schema=None): escape = _SQL_GET_IDENTIFIER[self.flavor] esc_name = escape(name) wld = _SQL_WILDCARD[self.flavor] flavor_map = { 'sqlite': ("SELECT name FROM sqlite_master " "WHERE type='table' AND name=%s;") % wld, 'mysql': "SHOW TABLES LIKE %s" % wld} query = flavor_map.get(self.flavor)

    return len(self.execute(query, [name,]).fetchall()) > 0

fails ...

example:

df.to_sql(name="test_df", con=conn, index=False, if_exists='replace', flavor="mysql")

pandas.io.sql.DatabaseError: Execution failed on sql: SHOW TABLES LIKE %s AnalysisException: Syntax error in line 1: SHOW TABLES LIKE %s

laserson commented 8 years ago

Could you post the exact code you're running that generates the error along with the entire stacktrace?

This appears to be because our _bind_parameters function only supports qmark and numeric parameter styles when you provide a list, and the pandas machinery is using the format style. I created #138 to track this. Thoughts on a workaround, @wesm?

catchthemonster commented 8 years ago

Hi, Love this idea with Impala and pandas. I talked to Wes regarding this approach to store tb's of data last week at Strata+Hadoop NYC conference... I digress... sorry Anyway here is the code: btw code is example from cloudera blog by Randy Carnevale / May 09, 2014 / 8:51 AM

host = "some_machine" port = 21050 conn = connect(host, port)

cur = conn.cursor() cur.execute("DROP TABLE IF EXISTS test_df")

table created but not necessity ...

cur.execute("CREATE TABLE test_df (a int, b int, c int, d int)") df = pd.DataFrame(np.reshape(range(16), (4, 4)), columns=['a', 'b', 'c', 'd'])

failure is bellow

df.to_sql(name="test_df", con=conn, index=False, if_exists='replace', flavor="mysql")

/cube/opt/python27/lib/python2.7/site-packages/pandas/io/sql.py:599: FutureWarning: The 'mysql' flavor with DBAPI connection is deprecated and will be removed in future versions. MySQL will be further supported with SQLAlchemy engines. warnings.warn(_MYSQL_WARNING, FutureWarning) Traceback (most recent call last): File "/home/sasha/PyChramProjects/Impala/ice/main/PyImpala.py", line 18, in df.to_sql(name="test_df", con=conn, index=False, if_exists='replace', flavor="mysql") File "/cube/opt/python27/lib/python2.7/site-packages/pandas/core/generic.py", line 982, in to_sql dtype=dtype) File "/cube/opt/python27/lib/python2.7/site-packages/pandas/io/sql.py", line 549, in to_sql chunksize=chunksize, dtype=dtype) File "/cube/opt/python27/lib/python2.7/site-packages/pandas/io/sql.py", line 1566, in to_sql table.create() File "/cube/opt/python27/lib/python2.7/site-packages/pandas/io/sql.py", line 648, in create if self.exists(): File "/cube/opt/python27/lib/python2.7/site-packages/pandas/io/sql.py", line 636, in exists return self.pd_sql.has_table(self.name, self.schema) File "/cube/opt/python27/lib/python2.7/site-packages/pandas/io/sql.py", line 1579, in has_table return len(self.execute(query, [name,]).fetchall()) > 0 File "/cube/opt/python27/lib/python2.7/site-packages/pandas/io/sql.py", line 1480, in execute raise_with_traceback(ex) File "/cube/opt/python27/lib/python2.7/site-packages/pandas/io/sql.py", line 1476, in execute self.con.rollback() File "/cube/opt/python27/lib/python2.7/site-packages/impala/hiveserver2.py", line 76, in rollback raise NotSupportedError pandas.io.sql.DatabaseError: Execution failed on sql: SHOW TABLES LIKE %s AnalysisException: Syntax error in line 1: SHOW TABLES LIKE %s ^ Encountered: % Expected: STRING LITERAL

CAUSED BY: Exception: Syntax error unable to rollback

it seems that in hiveserver2.py failure is with return len(self.execute(query, [name,]).fetchall()) > 0

but the catch is that self is SQLiteDatabase not mysql backend.Am I supposed to use sqlalchemy or I can push this via pandas ...

This would be a perfect solution for heavy ingestion workflow task through pandas and multiprocessing ... Regards,

laserson commented 8 years ago

Try the latest master

laserson commented 8 years ago

See #138