apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.5k stars 13.76k forks source link

Error in SQL Lab or trying to create a Dataset based on Hive #28560

Open alessio-bernesco opened 5 months ago

alessio-bernesco commented 5 months ago

Bug description

After connecting to a Hive server (created throug a Flink Sql Gateway) i can browse the schema and tables list but when showing the table schema there's an exception in the logs:

superset_app          | 2024-05-15 09:53:27,461:ERROR:flask_appbuilder.api:'bool' object has no attribute 'strip'
superset_app          | Traceback (most recent call last):
superset_app          |   File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/__init__.py", line 110, in wraps
superset_app          |     return f(self, *args, **kwargs)
superset_app          |   File "/app/superset/views/base_api.py", line 127, in wraps
superset_app          |     raise ex
superset_app          |   File "/app/superset/views/base_api.py", line 121, in wraps
superset_app          |     duration, response = time_function(f, self, *args, **kwargs)
superset_app          |   File "/app/superset/utils/core.py", line 1463, in time_function
superset_app          |     response = func(*args, **kwargs)
superset_app          |   File "/app/superset/utils/log.py", line 255, in wrapper
superset_app          |     value = f(*args, **kwargs)
superset_app          |   File "/app/superset/databases/api.py", line 741, in table_metadata
superset_app          |     table_info = get_table_metadata(database, table_name, schema_name)
superset_app          |   File "/app/superset/databases/utils.py", line 67, in get_table_metadata
superset_app          |     columns = database.get_columns(table_name, schema_name)
superset_app          |   File "/app/superset/models/core.py", line 847, in get_columns
superset_app          |     return self.db_engine_spec.get_columns(
superset_app          |   File "/app/superset/db_engine_specs/hive.py", line 419, in get_columns
superset_app          |     return BaseEngineSpec.get_columns(inspector, table_name, schema, options)
superset_app          |   File "/app/superset/db_engine_specs/base.py", line 1343, in get_columns
superset_app          |     cast(list[SQLAColumnType], inspector.get_columns(table_name, schema))
superset_app          |   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 497, in get_columns
superset_app          |     col_defs = self.dialect.get_columns(
superset_app          |   File "/usr/local/lib/python3.10/site-packages/pyhive/sqlalchemy_hive.py", line 325, in get_columns
superset_app          |     rows = [[col.strip() if col else None for col in row] for row in rows]
superset_app          |   File "/usr/local/lib/python3.10/site-packages/pyhive/sqlalchemy_hive.py", line 325, in <listcomp>
superset_app          |     rows = [[col.strip() if col else None for col in row] for row in rows]
superset_app          |   File "/usr/local/lib/python3.10/site-packages/pyhive/sqlalchemy_hive.py", line 325, in <listcomp>
superset_app          |     rows = [[col.strip() if col else None for col in row] for row in rows]
superset_app          | AttributeError: 'bool' object has no attribute 'strip'

How to reproduce the bug

In the SQL Lab or "Create Dataset" area try to connect to a Hive server created by Flink Sql Gateway, the connection and table list is retrieved but whe browsing for the table schema the exeption is fired.

Screenshots/recordings

No response

Superset version

4.0.1

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

dosubot[bot] commented 5 months ago

The error you're encountering, "AttributeError: 'bool' object has no attribute 'strip'", suggests that a boolean value is being passed to a method that expects a string, specifically within the get_columns method of the pyhive.sqlalchemy_hive module. This issue arises when Superset tries to retrieve the table schema from your Hive server through the Flink SQL Gateway.

To address this issue, consider the following steps:

  1. Review the Data Types: Ensure that the data passed into methods expecting strings are indeed strings. This might require checking the code that interfaces with the Hive server for any data type mismatches.

  2. Data Conversion: Before calling methods that operate on strings (like .strip()), verify the data type of the variable and convert it to a string if necessary.

  3. Superset and PyHive Versions: Ensure you're using the latest versions of Superset and PyHive, as updates may contain fixes for this issue. If the problem persists, consider reporting it to the Superset GitHub repository.

  4. Custom Code Checks: If you have custom code or plugins in Superset for interfacing with the Hive server, add explicit checks to handle or convert non-string types before they are processed further.

This approach targets the specific error by ensuring type compatibility and applying necessary updates or patches. For more detailed guidance, reviewing the Superset codebase and the PyHive documentation might provide additional insights.

#### Details **Sources** - [superset/utils/core.py](https://github.com/apache/superset/blob/master/superset/utils/core.py) - [superset/db_engine_specs/hive.py](https://github.com/apache/superset/blob/master/superset/db_engine_specs/hive.py) - [superset/databases/schemas.py](https://github.com/apache/superset/blob/master/superset/databases/schemas.py) - [superset/connectors/sqla/utils.py](https://github.com/apache/superset/blob/master/superset/connectors/sqla/utils.py) - [superset/result_set.py](https://github.com/apache/superset/blob/master/superset/result_set.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.