apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
60.52k stars 13.07k forks source link

Calculated column jinja templating fails in ORDER BY clause #29378

Open lilotter45 opened 1 week ago

lilotter45 commented 1 week ago

Bug description

When attempting to use a calculated column that uses jinja templating as the metric in a bar chart, the jinja placeholders are included in the ORDER BY clause of the executed query, rather than being replaced with rendered SQL string. The template renders correctly in the SELECT statement of the same query.

In this example, the calculated column is defined as

IIF(MeasureValue < 
    {% if filter_values('fresh_min_bcs')|length %} 
        {{ filter_values('fresh_min_bcs')[0] }} 
    {% else %} 
        2.75 
    {% endif %},
    1,
    0
)

The executed query (trimmed for brevity) is

WITH cteMeasure AS (SELECT ... FROM ...),
     __cte AS (SELECT ... )
SELECT TOP 10000 rpt.DateTrunc('QUARTER' , MeasureDate) AS [MeasureDate_Group],
           IIF(SortNum >= 3 , '3' , CAST(SortNum AS VARCHAR(2))) AS [SortGroup],
-- The template renders correctly on the following line
           sum(IIF(MeasureValue < 2.75, 1, 0)) AS [COUNT(Is Low BCS)]
FROM __cte
WHERE ...
GROUP BY ...
-- The template does not render correctly in the ORDER BY clause on the following line
ORDER BY sum(IIF(MeasureValue < {% if filter_values('fresh_min_bcs')|length %} {{ filter_values('fresh_min_bcs')[0] }} {% else %} 2.75 {% endif %}, 1, 0)) DESC;

How to reproduce the bug

  1. Create a dataset with a calcuated field that uses jinja templating
  2. Create a Bar Chart based on the above dataset
  3. Add the calculated column as a METRIC; the aggregration function does not matter
  4. Click UPDATE CHART
  5. An "Unexpected error" will be returned, citing "Incorrect syntax near '%'"; clicking on ... and "View query" will show the template text rather than rendered text in the ORDER BY clause of the executed query

Screenshots/recordings

No response

Superset version

4.0.1

Python version

3.10

Node version

Not applicable

Browser

Chrome

Additional context

Checklist

dosubot[bot] commented 1 week ago

Hey there, @lilotter45! I'm here to help you with any issues you're facing. I can assist with bug fixing, answering questions, and even guide you on becoming a contributor. Let me know how I can assist you!

To resolve the issue where Jinja placeholders in a calculated column are not being replaced with the rendered SQL string in the ORDER BY clause of a query in Superset, you can ensure that the calculated column is properly defined and referenced in the query object. The provided test case test_calculated_column_in_order_by demonstrates how to define a calculated column and use it in the ORDER BY clause.

Here is the relevant part of the test case:

@mock.patch("superset.models.core.Database.db_engine_spec", BigQueryEngineSpec)
@mock.patch("sqlalchemy_bigquery._helpers.create_bigquery_client", mock.Mock)
@pytest.mark.usefixtures("load_birth_names_dashboard_with_slices")
def test_calculated_column_in_order_by(self):
    table = self.get_table(name="birth_names")
    TableColumn(
        column_name="gender_cc",
        type="VARCHAR(255)",
        table=table,
        expression="""
        case
          when gender='boy' then 'male'
          else 'female'
        end
        """,
    )

    table.database.sqlalchemy_uri = "bigquery://"
    query_obj = {
        "groupby": ["gender_cc"],
        "is_timeseries": False,
        "filter": [],
        "orderby": [["gender_cc", True]],
    }
    sql = table.get_query_str(query_obj)
    assert "ORDER BY\n  `gender_cc` ASC" in sql

In this test case:

  1. A calculated column gender_cc is defined with a SQL expression.
  2. The query_obj includes the groupby and orderby clauses referencing the calculated column gender_cc.
  3. The generated SQL string is checked to ensure that the ORDER BY clause correctly includes the calculated column [1].

Additionally, ensure that the SQL string is properly rendered with the Jinja template engine. The get_rendered_sql method in superset/models/helpers.py is responsible for rendering the SQL with the template engine:

def get_rendered_sql(
    self,
    template_processor: Optional[BaseTemplateProcessor] = None,
) -> str:
    """
    Render sql with template engine (Jinja).
    """
    sql = self.sql.strip("\t\r\n; ")
    if template_processor:
        try:
            sql = template_processor.process_template(sql)
        except TemplateError as ex:
            raise QueryObjectValidationError(
                _(
                    "Error while rendering virtual dataset query: %(msg)s",
                    msg=ex.message,
                )
            ) from ex

    script = SQLScript(sql, engine=self.db_engine_spec.engine)
    if len(script.statements) > 1:
        raise QueryObjectValidationError(
            _("Virtual dataset query cannot consist of multiple statements")
        )

    if not sql:
        raise QueryObjectValidationError(_("Virtual dataset query cannot be empty"))
    return sql

Moreover, the make_orderby_compatible method in superset/connectors/sqla/models.py ensures that aliases for selected columns are not used in the ORDER BY clause if the database does not support it:

def make_orderby_compatible(
    self, select_exprs: list[ColumnElement], orderby_exprs: list[ColumnElement]
) -> None:
    """
    If needed, make sure aliases for selected columns are not used in
    `ORDER BY`.

    In some databases (e.g. Presto), `ORDER BY` clause is not able to
    automatically pick the source column if a `SELECT` clause alias is named
    the same as a source column. In this case, we update the SELECT alias to
    another name to avoid the conflict.
    """
    if self.db_engine_spec.allows_alias_to_source_column:
        return

    def is_alias_used_in_orderby(col: ColumnElement) -> bool:
        if not isinstance(col, Label):
            return False
        regexp = re.compile(f"\\(.*\\b{re.escape(col.name)}\\b.*\\)", re.IGNORECASE)
        return any(regexp.search(str(x)) for x in orderby_exprs)

    for col in select_exprs:
        if is_alias_used_in_orderby(col):
            col.name = f"{col.name}__"

By following this approach and ensuring the SQL string is properly rendered and compatible with the database's requirements, you can ensure that Jinja placeholders in calculated columns are properly replaced with the rendered SQL string in the ORDER BY clause [2][3].

To continue talking to Dosu, mention @dosu.