apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
61.76k stars 13.5k forks source link

Starrocks executing a specific query will cause the Query history page to report an error and not load the data #29991

Open kainchow opened 3 weeks ago

kainchow commented 3 weeks ago

Bug description

Starrocks executing a specific query will cause the Query history page to report an error and not load the data. Error msg: An error occurred while fetching Query historys: Fatal error Snipaste_2024-08-22_15-14-36

https://github.com/user-attachments/assets/dc4bf4da-4725-47ac-9545-ceaac1e429ae

How to reproduce the bug

  1. Create a data source with mysql, fill in the Starrocks cluster address and account secret.
  2. Goto Query history page(/sqllab/history/), at this point, you can see the query record normally.
  3. Goto Sql Lab, select the Starrocks data source you just created.
  4. Execute the following sql: select date_add(current_date, -1) as yst_date.
  5. Return to the Query history page, at this point the page reported an error, can not browse the query history.

Screenshots/recordings

superset_app container logs: 2024-08-22 06:53:04,605:ERROR:flask_appbuilder.api:list index out of range Traceback (most recent call last): File "/app/superset/sql_parse.py", line 297, in _extract_tables_from_sql statements = parse(self.stripped(), dialect=self._dialect) File "/usr/local/lib/python3.10/site-packages/sqlglot/init.py", line 87, in parse return Dialect.get_or_raise(read or dialect).parse(sql, opts) File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/dialect.py", line 490, in parse return self.parser(opts).parse(self.tokenize(sql), sql) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 1153, in parse return self._parse( File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 1219, in _parse expressions.append(parse_method(self)) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 1427, in _parse_statement expression = self._parse_set_operations(expression) if expression else self._parse_select() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2486, in _parseselect from = self._parse_from() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2693, in _parse_from exp.From, comments=self._prev_comments, this=self._parse_table(joins=joins) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3067, in _parse_table subquery = self._parse_select(table=True) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2501, in _parse_select self._parse_table() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3067, in _parse_table subquery = self._parse_select(table=True) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2491, in _parse_select this = self._parse_query_modifiers(this) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 2639, in _parse_query_modifiers key, expression = parser(self) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 942, in TokenType.WHERE: lambda self: ("where", self._parse_where()), File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3394, in _parse_where exp.Where, comments=self._prev_comments, this=self._parse_conjunction() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3704, in _parse_conjunction return self._parse_tokens(self._parse_equality, self.CONJUNCTION) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5534, in _parse_tokens this = parse_method() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3707, in _parse_equality return self._parse_tokens(self._parse_comparison, self.EQUALITY) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5541, in _parse_tokens expression=parse_method(), File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3710, in _parse_comparison return self._parse_tokens(self._parse_range, self.COMPARISON) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5534, in _parse_tokens this = parse_method() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3713, in _parse_range this = this or self._parse_bitwise() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3832, in _parse_bitwise this = self._parse_term() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3864, in _parse_term return self._parse_tokens(self._parse_factor, self.TERM) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5534, in _parse_tokens this = parse_method() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3868, in _parse_factor this = parse_method() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3889, in _parse_unary return self._parse_at_time_zone(self._parse_type()) File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/mysql.py", line 602, in _parse_type return super()._parse_type(parse_interval=parse_interval) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3910, in _parse_type data_type = self._parse_types(check_func=True, allow_identifiers=False) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4005, in _parse_types expressions = self._parse_csv(self._parse_type_size) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 5520, in _parse_csv parse_result = parse_method() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3927, in _parse_type_size this = self._parse_type() File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/mysql.py", line 602, in _parse_type return super()._parse_type(parse_interval=parse_interval) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 3911, in _parse_type this = self._parse_column() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4113, in _parse_column this = self._parse_column_reference() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4117, in _parse_column_reference this = self._parse_field() File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4232, in _parse_field or self._parse_function(anonymous=anonymous_func) File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4253, in _parse_function func = self._parse_function_call( File "/usr/local/lib/python3.10/site-packages/sqlglot/parser.py", line 4319, in _parse_function_call func = function(args) File "/usr/local/lib/python3.10/site-packages/sqlglot/dialects/dialect.py", line 707, in _builder raise ParseError(f"INTERVAL expression expected but got '{interval}'") sqlglot.errors.ParseError: INTERVAL expression expected but got '-1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 110, in wraps return f(self, *args, *kwargs) File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 182, in wraps return f(self, args, kwargs) File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 1711, in get_list return self.get_list_headless(kwargs) File "/app/superset/queries/api.py", line 340, in get_list_headless response[flask_appbuilder.const.API_RESULT_RES_KEY] = list_model_schema.dump(lst, many=True) File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 557, in dump result = self._serialize(processed_obj, many=many) File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 519, in _serialize return [ File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 520, in self._serialize(d, many=False) File "/usr/local/lib/python3.10/site-packages/marshmallow/schema.py", line 525, in _serialize value = field_obj.serialize(attr_name, obj, accessor=self.get_attribute) File "/usr/local/lib/python3.10/site-packages/marshmallow/fields.py", line 344, in serialize return self._serialize(value, attr, obj, kwargs) File "/usr/local/lib/python3.10/site-packages/marshmallow/fields.py", line 1991, in _serialize return self._serialize_method(obj) File "/app/superset/queries/schemas.py", line 76, in get_sql_tables return obj.sql_tables File "/app/superset/models/sql_lab.py", line 75, in sql_tables extract_tables_from_jinja_sql( File "/app/superset/sql_parse.py", line 1126, in extract_tables_from_jinja_sql ).tables File "/app/superset/sql_parse.py", line 287, in tables self._tables = self._extract_tables_from_sql() File "/app/superset/sql_parse.py", line 303, in _extract_tables_from_sql ex.errors[0] IndexError: list index out of range 2024-08-22 06:53:04,613:INFO:werkzeug:192.168.10.1 - - [22/Aug/2024 06:53:04] "GET /api/v1/query/?q=(filters:!((col:database,opr:rel_o_m,value:2)),order_column:start_time,order_direction:desc,page:0,page_size:25) HTTP/1.1" 500 -

Superset version

4.0.2

Python version

3.10

Node version

16

Browser

Firefox

Additional context

No response

Checklist

dosubot[bot] commented 3 weeks ago

To resolve the "list index out of range" error in the Query history page when executing a specific query in Starrocks on Apache Superset, you should check the handling of the statements list in the execute_sql_statements function. Specifically, ensure that the list of statements is not empty before attempting to access its elements.

Here is a potential fix:

  1. Add a check to ensure that statements is not empty before proceeding with the execution loop.
def execute_sql_statements(
    # pylint: disable=too-many-arguments, too-many-locals, too-many-statements, too-many-branches
    query_id: int,
    rendered_query: str,
    return_results: bool,
    store_results: bool,
    start_time: Optional[float],
    expand_data: bool,
    log_params: Optional[dict[str, Any]],
) -> Optional[dict[str, Any]]:
    """Executes the sql query returns the results."""
    if store_results and start_time:
        # only asynchronous queries
        stats_logger.timing("sqllab.query.time_pending", now_as_float() - start_time)

    query = get_query(query_id)
    payload: dict[str, Any] = {"query_id": query_id}
    database = query.database
    db_engine_spec = database.db_engine_spec
    db_engine_spec.patch()

    if database.allow_run_async and not results_backend:
        raise SupersetErrorException(
            SupersetError(
                message=__("Results backend is not configured."),
                error_type=SupersetErrorType.RESULTS_BACKEND_NOT_CONFIGURED_ERROR,
                level=ErrorLevel.ERROR,
            )
        )

    # Breaking down into multiple statements
    parsed_query = ParsedQuery(
        rendered_query,
        engine=db_engine_spec.engine,
    )
    if not db_engine_spec.run_multiple_statements_as_one:
        statements = parsed_query.get_statements()
        logger.info(
            "Query %s: Executing %i statement(s)", str(query_id), len(statements)
        )
    else:
        statements = [rendered_query]
        logger.info("Query %s: Executing query as a single statement", str(query_id))

    # Check if statements list is empty
    if not statements:
        raise SupersetErrorException(
            SupersetError(
                message=__("No valid SQL statements found."),
                error_type=SupersetErrorType.INVALID_SQL_STATEMENT_ERROR,
                level=ErrorLevel.ERROR,
            )
        )

    logger.info("Query %s: Set query to 'running'", str(query_id))
    query.status = QueryStatus.RUNNING
    query.start_running_time = now_as_float()
    db.session.commit()

    # Rest of the function...

This check ensures that the statements list is not empty before attempting to execute any SQL statements, which should prevent the "list index out of range" error [1].

To continue talking to Dosu, mention @dosu.

rusackas commented 3 weeks ago

Maybe @miomiocat has some insight here?

kainchow commented 3 weeks ago

I made a temporary change to superset/sql_parse.py which avoids this issue, but it may cause other problems, so someone still needs to fix the bug!

def _extract_tables_from_sql(self) -> set[Table]:
    """
    Extract all table references in a query.

    Note: this uses sqlglot, since it's better at catching more edge cases.
    """
    try:
        statements = parse(self.stripped(), dialect=self._dialect)
    except ParseError as ex:
        statements = []
    except SqlglotError as ex:
        ...