apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.81k stars 13.88k forks source link

parse error in select in cases of postgresSQL dialect #27549

Closed JurajBurian closed 8 months ago

JurajBurian commented 8 months ago

Bug description

I am testing the latest master image (created 2024-03-13T22:31:57.559418117Z ) and I see that a new problem is happening. The error occurs when a native filter or dashboard (chart) requests data. We are using Postgres and the parser throws an exception like this:

superset_app          | 2024-03-18 09:52:14,145:ERROR:superset.views.base:Expecting (. Line 24, Col: 29.
superset_app          |   ) || ' days'
superset_app          |  end as range,
superset_app          |  low::integer, high::integer
superset_app          | from ranges_input_array
superset_app          | )
superset_app          | ,farm_setting as materialized (
superset_app          | select 
superset_app          |  cross_table.id_farm::bigint as id_farm
superset_app          |  ,case when lower(cross_table.abortion_no_livebor
superset_app          | Traceback (most recent call last):
superset_app          |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
superset_app          |     rv = self.dispatch_request()
superset_app          |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
superset_app          |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
superset_app          |   File "/usr/local/lib/python3.9/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
superset_app          |     return f(self, *args, **kwargs)
superset_app          |   File "/app/superset/views/base_api.py", line 127, in wraps
superset_app          |     raise ex
superset_app          |   File "/app/superset/views/base_api.py", line 121, in wraps
superset_app          |     duration, response = time_function(f, self, *args, **kwargs)
superset_app          |   File "/app/superset/utils/core.py", line 1476, in time_function
superset_app          |     response = func(*args, **kwargs)
superset_app          |   File "/app/superset/utils/log.py", line 255, in wrapper
superset_app          |     value = f(*args, **kwargs)
superset_app          |   File "/app/superset/charts/data/api.py", line 262, in data
superset_app          |     return self._get_data_response(
superset_app          |   File "/app/superset/charts/data/api.py", line 418, in _get_data_response
superset_app          |     result = command.run(force_cached=force_cached)
superset_app          |   File "/app/superset/commands/chart/data/get_data_command.py", line 45, in run
superset_app          |     payload = self._query_context.get_payload(
superset_app          |   File "/app/superset/common/query_context.py", line 100, in get_payload
superset_app          |     return self._processor.get_payload(cache_query_context, force_cached)
superset_app          |   File "/app/superset/common/query_context_processor.py", line 587, in get_payload
superset_app          |     query_results = [
superset_app          |   File "/app/superset/common/query_context_processor.py", line 588, in <listcomp>
superset_app          |     get_query_results(
superset_app          |   File "/app/superset/common/query_actions.py", line 225, in get_query_results
superset_app          |     return result_func(query_context, query_obj, force_cached)
superset_app          |   File "/app/superset/common/query_actions.py", line 103, in _get_full
superset_app          |     payload = query_context.get_df_payload(query_obj, force_cached=force_cached)
superset_app          |   File "/app/superset/common/query_context.py", line 121, in get_df_payload
superset_app          |     return self._processor.get_df_payload(
superset_app          |   File "/app/superset/common/query_context_processor.py", line 156, in get_df_payload
superset_app          |     query_result = self.get_query_result(query_obj)
superset_app          |   File "/app/superset/common/query_context_processor.py", line 234, in get_query_result
superset_app          |     result = query_context.datasource.query(query_object.to_dict())
superset_app          |   File "/app/superset/connectors/sqla/models.py", line 1714, in query
superset_app          |     query_str_ext = self.get_query_str_extended(query_obj)
superset_app          |   File "/app/superset/models/helpers.py", line 911, in get_query_str_extended
superset_app          |     sqlaq = self.get_sqla_query(**query_obj)
superset_app          |   File "/app/superset/models/helpers.py", line 1750, in get_sqla_query
superset_app          |     tbl, cte = self.get_from_clause(template_processor)
superset_app          |   File "/app/superset/connectors/sqla/models.py", line 1438, in get_from_clause
superset_app          |     from_sql = self.get_rendered_sql(template_processor)
superset_app          |   File "/app/superset/models/helpers.py", line 1085, in get_rendered_sql
superset_app          |     script = SQLScript(sql.strip("\t\r\n; "), engine=self.db_engine_spec.engine)
superset_app          |   File "/app/superset/sql_parse.py", line 343, in __init__
superset_app          |     for statement in parse(query, dialect=dialect)
superset_app          |   File "/usr/local/lib/python3.9/site-packages/sqlglot/__init__.py", line 86, in parse
superset_app          |     return Dialect.get_or_raise(read or dialect).parse(sql, **opts)
superset_app          |   File "/usr/local/lib/python3.9/site-packages/sqlglot/dialects/dialect.py", line 442, in parse
superset_app          |     return self.parser(**opts).parse(self.tokenize(sql), sql)
superset_app          |   File "/usr/local/lib/python3.9/site-packages/sqlglot/parser.py", line 1026, in parse
superset_app          |     return self._parse(
superset_app          |   File "/usr/local/lib/python3.9/site-packages/sqlglot/parser.py", line 1092, in _parse
superset_app          |     expressions.append(parse_method(self))
superset_app          |   File "/usr/local/lib/python3.9/site-packages/sqlglot/parser.py", line 1287, in _parse_statement
superset_app          |     expression = self._parse_set_operations(expression) if expression else self._parse_select()
superset_app          |   File "/usr/local/lib/python3.9/site-packages/sqlglot/parser.py", line 2211, in _parse_select
superset_app          |     cte = self._parse_with()
superset_app          |   File "/usr/local/lib/python3.9/site-packages/sqlglot/parser.py", line 2321, in _parse_with
superset_app          |     expressions.append(self._parse_cte())
superset_app          |   File "/usr/local/lib/python3.9/site-packages/sqlglot/parser.py", line 2339, in _parse_cte
superset_app          |     exp.CTE, this=self._parse_wrapped(self._parse_statement), alias=alias
superset_app          |   File "/usr/local/lib/python3.9/site-packages/sqlglot/parser.py", line 5092, in _parse_wrapped
superset_app          |     self.raise_error("Expecting (")
superset_app          |   File "/usr/local/lib/python3.9/site-packages/sqlglot/parser.py", line 1136, in raise_error
superset_app          |     raise error
superset_app          | sqlglot.errors.ParseError: Expecting (. Line 24, Col: 29.
superset_app          |   ) || ' days'
superset_app          |  end as range,
superset_app          |  low::integer, high::integer
superset_app          | from ranges_input_array
superset_app          | )
superset_app          | ,farm_setting as materialized (
superset_app          | select 
superset_app          |  cross_table.id_farm::bigint as id_farm
superset_app          |  ,case when lower(cross_table.abortion_no_livebor

problem is in ... as materialized ... part. When I remove materialized keyword, everything is working fine. I am 100% that previous build of master is working correctly - we alredy use it in production ! Also my coluege find that there is problem wit LATERAL keyword, here is example : SELECT * FROM table_a a, inner join LATERAL ( SELECT value FROM table_b WHERE table_b.id = a.id)

Not sure what is root cause of the problem. May be a new version of sqlglot library or dialect is not recognized ?

How to reproduce the bug

  1. create select like this :
    with data as materialized (
    select a.n/100 as x, 10*a.n/30 as y from generate_series(1, 1000) as a(n)
    ) select * from data
  2. create dataset with this select
  3. create chart and add chart to dashboar
  4. show dashboard

Screenshots/recordings

image

Superset version

master / latest-dev

Python version

I don't know

Node version

I don't know

Browser

Chrome

Additional context

No response

Checklist

betodealmeida commented 8 months ago

Taking a look.

john-bodley commented 8 months ago

@JurajBurian per the PostgreSQL documentation it seems like your WITH statement might not be correct, i.e., it should be of the form:

WITH data AS ( ...

That said some of these examples can't be parsed by SQLGlot which use the NOT MATERIALIZED construct. This would likely indicate an issue with the SQLGlot parser.

JurajBurian commented 8 months ago

@john-bodley in documnetation you mentioned is part about materialization (see: 7.8.3. Common Table Expression Materialization) . One can write with xxx as (not) materialized ( ... ) select xxx .... Here is screnshot from dbeaver running my query ...

Screenshot_20240319_183953

As I mentioned, there is probably problem with sqlglot (may be somebody changed version ?) or there is not correctly detected sql dialect.

We have strored fresh but not last "master" docker image (created: 2024-03-05T13:05:22.874959317Z) that is able parse our selects corectly.
We did't have any problem with this in past.

betodealmeida commented 8 months ago

Right, it is a valid query and should work unquoted (double quoting it is a syntax error).

@JurajBurian is it easy for to test with a newer version of sqlglot?

Though I think this is caused by how we wrap the SQL Lab query when creating charts. I was able to repro, I'm taking a look now.

betodealmeida commented 8 months ago

I filed a ticked upstream (https://github.com/tobymao/sqlglot/issues/3170), I'm going to give it a try and see if it's something easy to fix.

betodealmeida commented 8 months ago

Fixed in https://github.com/tobymao/sqlglot/pull/3171 in record time, I'll bump the version in Superset as soon as there's a new release.

JurajBurian commented 8 months ago

Hello Folks, I want just inform you for future. I found https://codapi.org/try/sqlglot/ site, where one can test sqlglot. (and yes, version used by site has bug with materialized )