dropbox / PyHive

Python interface to Hive and Presto. 🐝
Other
1.67k stars 550 forks source link

metadata.reflect() not working causing `DESCRIBE default` because of possible bug in get_table_names() #461

Open AntonOvsyannikov opened 1 year ago

AntonOvsyannikov commented 1 year ago

The following snippet (assuming hive.thriftserver running on 10000 port)

engine = create_engine('hive://localhost:10000/default')
metadata = MetaData(bind=engine)
metadata.reflect()

causes

sqlalchemy.exc.OperationalError: (pyhive.exc.OperationalError) TExecuteStatementResp(status=TStatus(statusCode=3, 
...
errorMessage="Error running query: org.apache.spark.sql.AnalysisException: Table or view not found: default; line 1 pos 9;\n'DescribeRelation false, [col_name#7203, data_type#7204, comment#7205]\n+- 'UnresolvedTableOrView [default], DESCRIBE TABLE, true\n"), operationHandle=None)
[SQL: DESCRIBE default]

This is because of possibly incorrect implementation of get_table_names in sqlalchemy_hive.py::376 (v0.7.1.dev0)

    def get_table_names(self, connection, schema=None, **kw):
        query = 'SHOW TABLES'
...
        return [row[0] for row in connection.execute(text(query))]

while in row[0] in spark sql is schema name

SHOW TABLES;
  +-----------+------------+--------------+--+
  | database  | tableName  | isTemporary  |
  +-----------+------------+--------------+--+
  | default   | sam        | false        |
  | default   | sam1       | false        |
  | default   | suj        | false        |
  +-----------+------------+--------------+--+

so proposed fix is

    def get_table_names(self, connection, schema=None, **kw):
        query = 'SHOW TABLES'
        if schema:
            query += ' IN ' + self.identifier_preparer.quote_identifier(schema)
        return [row[1] for row in connection.execute(text(query))]
pedrosalgadowork commented 8 months ago

Same issue here with pyhive 7.0 and sqlalchemy 2.0.