apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
63.16k stars 13.99k forks source link

Cannot save a query with Parameters as a Dataset (Jinja template default values) #25786

Open qleroy opened 1 year ago

qleroy commented 1 year ago

A Fatal error is raised when I try to save a query that contains Parameters as a virtual dataset.

How to reproduce the bug

  1. Go to the SQL lab and open a new query, select the default examples database
  2. Click on the three-dots menu and Parameters.
  3. Enter the following JSON to specify defaults values for Jinja templates (relevant doc )
    {
    "my_table": "persons"
    }
  4. Enter the following text as the query
    SELECT * FROM {{ my_table }}

    Observe that you can execute the query, the context you specified in the Parameters is correctly injected because the results appear as expected.

  5. Save the query as a dataset and the Fatal error is raised.

Expected results

We should be able to save the Jinja-templated query as a new virtual dataset.

Actual results

A fatal error is raised.

See below the logs of the superset_app container :

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/flask_appbuilder/api/__init__.py", line 110, in wraps
    return f(self, *args, **kwargs)
  File "/app/superset/views/base_api.py", line 127, in wraps
    raise ex
  File "/app/superset/views/base_api.py", line 121, in wraps
    duration, response = time_function(f, self, *args, **kwargs)
  File "/app/superset/utils/core.py", line 1526, in time_function
    response = func(*args, **kwargs)
  File "/app/superset/utils/log.py", line 255, in wrapper
    value = f(*args, **kwargs)
  File "/app/superset/views/base_api.py", line 93, in wraps
    return f(self, *args, **kwargs)
  File "/app/superset/datasets/api.py", line 315, in post
    new_model = CreateDatasetCommand(item).run()
  File "/app/superset/datasets/commands/create.py", line 50, in run
    dataset.fetch_metadata(commit=False)
  File "/app/superset/connectors/sqla/models.py", line 1247, in fetch_metadata
    new_columns = self.external_metadata()
  File "/app/superset/connectors/sqla/models.py", line 715, in external_metadata
    return get_virtual_table_metadata(dataset=self)
  File "/app/superset/connectors/sqla/utils.py", line 143, in get_virtual_table_metadata
    raise SupersetGenericDBErrorException(message=str(ex)) from ex
superset.exceptions.SupersetGenericDBErrorException: syntax error at or near "{"
LINE 1: SELECT * FROM {{ my_table }}

Screenshots

sqllab preset sqllab preset Fatal error

Environment

Checklist

Additional context

I followed the docs at https://superset.apache.org/docs/installation/sql-templating/#jinja-templates and the blog post https://preset.io/blog/intro-jinja-templating-apache-superset/

Note that the query is correctly executed in the SQL IDE.

The issue also appears in Preset !

qleroy commented 1 year ago

Minimal config to reproduce the issue with a docker compose deployment

# docker/pythonpath_dev/superset_config_docker.py
FEATURE_FLAGS = {
  "ENABLE_TEMPLATE_PROCESSING": True,
}
qleroy commented 1 year ago

Tested with Superset 2.1.1 and it is working fine.

mishafeingold commented 1 year ago

Same fetal error is happening for us. We upgraded from 2.1.0 to 3.0.1 and we noticed the template variables are not being translated correctly. In the superset logs it appears Superset is literally sending the braces to the database instead of updating the query with the parameters set:

DB::Exception: Syntax error: failed at position 9 ('{') (line 1, col 9): { templatevar }} AS __timestamp,

These are our feature flags:

# feature_flags
FEATURE_FLAGS = {
  "DASHBOARD_CROSS_FILTERS": True,
  "DRILL_TO_DETAIL": True,
  "DRILL_BY": True,
  "ENABLE_TEMPLATE_PROCESSING": True,
  "ENABLE_TEMPLATE_REMOVE_FILTERS": False,
  "DASHBOARD_RBAC": True,
  "GLOBAL_ASYNC_QUERIES": True,
}

We recently switched ENABLE_TEMPLATE_REMOVE_FILTERS to false as it was a change in 3.0 https://preset.io/blog/superset-3-0-release-notes/#feature-flags, but we are still seeing the same issue.

This use to work for us in 2.1.0.

RubingHan commented 1 year ago

This error is due to a change in the logic of saving Datasets in Sqlab as of v3.0.1: in v2.1, this was handled separately within the sqlab_viz function in views/core.py, where templateParams were filled in before passing to the database; whereas in v3.0.1, it is consolidated into the CreateDatasetCommand.run function in datasets/commands/create.py, and the templateParams parameter was not passed along with the SQL to this function.

It is uncertain whether this counts as a BUG since templateParams are indeed not necessary to be passed to the database during the Dataset creation process.

In fact, there is a way to avoid triggering this error: by adding if expressions to these variables, ensuring that the variable expressions are not passed to the database when saving the Dataset. For example, a variable like ds >= {{ to_dttm }} can be modified to:

{% if to_dttm %}
ds >= {{ to_dttm }}
{% endif %}

This way, even in the absence of templateParams, the Dataset can still be saved and executed correctly. An {% else %} statement can also be added to define a default value for when the templateParams are missing.

maxcorbeau commented 10 months ago

It is uncertain whether this counts as a BUG

If it helps save time and focus on the big picture: I'm not convinced your Superset users really care what categorisation you guys apply to this issue. It used to worked, and now it doesn't, so the product is getting worse. Personally the latter part is the part I would focus on.

Vitor-Avila commented 10 months ago

Hey @qleroy,

As synced via Slack, during the dataset creation process the query is evaluated and doesn't have the parameters context defined in SQL Lab (@RubingHan did a much better job explaining this on the code level -- kudos!) resulting in a "broken" SQL query which throws the error.

While you "could" create a dataset with this type of implementation before, you would end up with a "broken" dataset until the SQL parameters are defined in the new dataset (I don't think this was handled before), so you would still face an error, just one step further.

You can avoid this error by adding if/else statements to your SQL query so that the query still works without the parameter values:

select * from 
{% if my_table %}
  {{my_table}}
{% else %}
  foo
{% endif %}

In regards to improving this flow, I think a good long-term solution would be adding a checkbox to the virtual dataset creation modal to ask users if the parameters defined in SQL Lab should be carried over to the virtual dataset, so they're used in the query validation + saved in the dataset level.

qleroy commented 10 months ago

I understand. I find it confusing because if you follow the blog post about Jinja templating it used to work on 2.1 and it is now broken since 3.0

It was my understanding that the parameters in the SQLLab would carry over as parameters for the virtual dataset because I always use the SQLLab to create datasets ultimately.

A checkbox would be much appreciated.

At the very least the notification Fatal error should provide clarification about what is going on.

I will dive into this to suggest a PR to resolve these aspects. 👍

Thank you very much !

rusackas commented 6 months ago

I think this issue might be a duplicate? And maybe this one? Would love to consolidate issues/threads if possible.

amosanurag commented 5 months ago

The issue appears when you try to use jinja templating and save the query in SQLLab as a dataset. If you save the dataset and provide a default value in place of using any jinja templates.Then during the chart development process, you can make updates to the query and enter your jinja template. This won't lead to errors.

cherns31 commented 4 months ago

Thanks you all for the help. I used the { if }workaround and now it works. Really annoying they haven't solved this yet.

I'm using {{ filter_values('xyz') }} in my queries and they were working properly when passing the _filters parameters in the SQL Lab, but now it no longer works if I add them to the dataset's Template parameters sections under Settings. Any idea why? It just gets my default { else } values.

Meena0155 commented 2 months ago

@cherns31 .. Did you end up solving this issue.. I am facing the same one

cherns31 commented 2 months ago

I didn't. I was using a parametrized UDTF so I decided to set the defaults in the SQL definition and pass nothing when filter_values aren't available