apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.81k stars 13.87k forks source link

[ENABLE_TEMPLATE_PROCESSING] BaseTemplateProcessor interface (process_template) is misused #28218

Open tseruga opened 6 months ago

tseruga commented 6 months ago

Bug description

Unsure of when this regression happened, but it likely happened part of #26476 or #27470

Within the BaseTemplateProcessor base class, we have the following method interface for process_template():

https://github.com/apache/superset/blob/52f8734662cdeadba1a4ae80be3e2c74ad8f85a9/superset/jinja_context.py#L493-L504

Specifically, the type of sql should be a str.

It appears that this contract is being broken after one or more of the above changes. Instead, this method (when implemented and assigned to an engine) will sometimes be passed a jinja2.nodes.Template type object.

One example of this is here: https://github.com/apache/superset/blob/52f8734662cdeadba1a4ae80be3e2c74ad8f85a9/superset/sql_parse.py#L1524-L1579

We see that we grab the template processor, then convert the raw str to a Template object:

    processor = get_template_processor(database)
    template = processor.env.parse(sql)

This Template object is then passed to the process_template() method:


    return (
        tables
        | ParsedQuery(
            sql_statement=processor.process_template(template),
            engine=database.db_engine_spec.engine,
        ).tables
    )

Which breaks the contract set by the BaseTemplateProcessor interface.

The example implementation of a custom template processor also makes the assumption that the argument will always be a string as shown here:

https://github.com/apache/superset/blob/master/docs/docs/configuration/sql-templating.mdx?plain=1#L123-L149

I believe this example implementation will also break upon receiving a Template object instead of a string.

How to reproduce the bug

  1. Enable ENABLE_TEMPLATE_PROCESSING in config.py
  2. Implement a custom template processor extending the BaseTemplateProcessor
  3. Within process_template() perform a string operation Ex:
    class ExampleTemplateProcessor(BaseTemplateProcessor):
    def process_template(self, sql: str, **kwargs: Any) -> str:
        return sql.upper()
  4. Wire up the new template processor an engine that you can query (I used sqllite since that's what the example data uses)
    DEFAULT_PROCESSORS = {
    "presto": PrestoTemplateProcessor,
    "hive": HiveTemplateProcessor,
    "trino": TrinoTemplateProcessor,
    "sqlite": ExampleTemplateProcessor,
    }

    I added this to the list of DEFAULT_PROCESSORS for testing, but this also happens when using CUSTOM_TEMPLATE_PROCESSORS via the config.

  5. Attempt to query in sql labs - this should work as expected:
  6. Go to Sql Labs -> Query History
  7. Observe that the template processor now fails since it is passed a Template object
2024-04-25 11:22:34,467:ERROR:superset.views.base:'Template' object has no attribute 'upper'
Traceback (most recent call last):
...
  File "...superset/sql_parse.py", line 1576, in extract_tables_from_jinja_sql
    sql_statement=processor.process_template(template),
  File "...superset/jinja_context.py", line 665, in process_template
    return sql.upper()
AttributeError: 'Template' object has no attribute 'upper'

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.10

Node version

16

Browser

Firefox

Additional context

Feature flag: ENABLE_TEMPLATE_PROCESSING

Checklist

tseruga commented 6 months ago

Another finding that might help - if ENABLE_TEMPLATE_PROCESSING is turned off, another breakage in the contract shows up here:

https://github.com/apache/superset/blob/master/superset/models/sql_lab.py#L72-L83