apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
60.61k stars 13.1k forks source link

Drill to detail gives error on some charts when using Druid. #27416

Open iercan opened 4 months ago

iercan commented 4 months ago

Bug description

We recently upgraded Superset from version 2.1.1 to 3.1.1 and encountered errors with certain charts when attempting to access the drill-down detail window.

How to reproduce the bug

Error arises when attempting to open the drill-down detail popup window.

Screenshots/recordings

An example chart that gives error. It is a line chart with some advance analytic options.
image

Error shown on web image

Superset log

Traceback (most recent call last):
  File "/app/superset/connectors/sqla/models.py", line 1795, in query
    df = self.database.get_df(sql, self.schema, mutator=assign_column_label)
  File "/app/superset/models/core.py", line 612, in get_df
    self.db_engine_spec.execute(cursor, sqls[-1])
  File "/app/superset/db_engine_specs/base.py", line 1596, in execute
    raise cls.get_dbapi_mapped_exception(ex) from ex
  File "/app/superset/db_engine_specs/base.py", line 1594, in execute
    cursor.execute(query)
  File "/usr/local/lib/python3.9/site-packages/pydruid/db/api.py", line 62, in g
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pydruid/db/api.py", line 256, in execute
    first_row = next(results)
  File "/usr/local/lib/python3.9/site-packages/pydruid/db/api.py", line 362, in _stream_query
    msg = "{error} ({errorClass}): {errorMessage}".format(**payload)
KeyError: 'errorClass'

Superset version

3.1.1

Python version

3.9

Node version

16

Browser

Chrome

Additional context

druid version: 27.0.0 pydruid: 0.6.5

Checklist

rusackas commented 4 months ago

Hmmm.... I'm unable to reproduce this so far on the latest master (or pretty close to it, anyway). Are you able to make a chart with built-in example data, so we can try to reproduce this effectively?

iercan commented 4 months ago

It occurs only on druid datasources. Unfortunately I'm not able to reproduce it with example data.

rusackas commented 4 months ago

Ahh... ok, that helps. Let me clarify the title and labeling.

iercan commented 2 months ago

This issue still exists on 4.0.0 fyi

rusackas commented 2 months ago

Pinging the Druid folks from the rolodex for potential reproduction/debugging:

@betodealmeida @sujiplr @nytai @villebro

saidur2k commented 2 months ago

I have ran into the same issue. The failure pattern is when I have a UNION ALL, CTE, etc. the charts add a SUM over the metric.

Copy pasting the query from the Superset view query into DRUID returns "Error: UNSUPPORTED (OPERATOR)

Unhandled Query Planning Failure, see broker logs for details"

Is there a way to generate charts over the SQL query directly instead of aggregating again in Druid?

rusackas commented 2 months ago

Let's see what the robots think... @dosu-bot

dosubot[bot] commented 2 months ago

The issue you're experiencing with the "Error: 'errorClass'" when using drill-down details in Superset charts with Druid datasources appears to be related to how error messages are handled between the backend and the frontend. This problem might be due to changes in error handling practices in newer versions of Superset, which could expect different error structures or handle errors differently compared to older versions.

For the "Error: 'errorClass'" issue, it's important to ensure that any customizations, including visualizations or plugins, are updated to be compatible with the latest Superset error handling practices. This includes verifying that the frontend correctly interprets SupersetErrorType values and that any mappings of errors from the backend to the frontend are complete and up-to-date. If you've made customizations to either the frontend or backend, reviewing these changes for compatibility with the current version of Superset is crucial.

Regarding the "Error: UNSUPPORTED (OPERATOR)" message you encounter when executing complex SQL queries directly in Druid, it suggests that the query includes features or operators not supported by Druid's SQL dialect. You might need to adjust the query to fit within Druid's supported SQL features or consider using Druid's native query language for complex operations.

Keeping Superset and its dependencies, including pydruid, updated is also recommended to avoid or resolve issues, as continuous improvements and fixes are made.

This issue seems to be relatively unique or new, as there isn't much discussion or reporting on it within the community. This suggests that resolving it might require specific troubleshooting or potentially reporting it as a new issue if it persists despite following general troubleshooting steps.

If the problem continues, providing specific details about your setup and the troubleshooting steps you've already taken could help identify more targeted solutions or facilitate further investigation by the Superset community.

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

mistercrunch commented 2 months ago

There's clearly a pydruid bug here - while surfacing another problem. It would help to see the underlying SQL. You should be able to click the [View Query] in the context menu, maybe there's something in that SQL that Druid doesn't like.

mistercrunch commented 2 months ago

@betodealmeida and me are both pydruid maintainers and could help address the issue there, where we should not trust that the payload has bot have certain keys like we seem to do given the stacktrace.

StephanPF commented 1 week ago

I get the same issue. My datasource is in Druid. If I copy paste the query in Druid it works fine.

Example of query:

SELECT TIME_FLOOR(CAST("__time" AS TIMESTAMP), 'P1M') AS "__time",
       sum("NET_AMOUNT") AS "SUM(NET_AMOUNT)"
FROM "druid"."CL_Revenue"
GROUP BY TIME_FLOOR(CAST("__time" AS TIMESTAMP), 'P1M')
LIMIT 50000;

@mistercrunch let me know if I can help

mistercrunch commented 1 week ago

This will fix the error that happens while reporting the other error. I submitted this just because it's an easy fix: https://github.com/druid-io/pydruid/pull/322

This will only enable us to have more visibility into the route error that triggers this, but without it we don't have a good stacktrace. It'd be great if someone can path their local with that quick fix so that we can see the actual error that pydruid hits

mistercrunch commented 1 week ago

For people who have a druid env setup, you may be able to do a quick manual fix in your python environment with the edit in my PR to get more intel and share it here.

@StephanPF the query you shared does not look like a drill-to-details-type query. I'd expect to see something more like a SELECT * with current filters/predicates applied and no GROUP BY clause

StephanPF commented 1 week ago

@mistercrunch happy to help here but I'll need some guidance. Do you mind reaching out in PM?

Note: The query from my previous message was "View query" not "Drill to details" - I'm not sure how to get the drill to details query..

mistercrunch commented 1 week ago

One way to log the query to stdout is to add this to your superset_config.py

def SQL_QUERY_MUTATOR(  # pylint: disable=invalid-name,unused-argument
    sql: str, **kwargs: Any
) -> str:
    print("DEBUG: " + sql)
    return sql

I'm pretty sure the drill-to-details query - as all other queries - should go through this SQL_QUERY_MUTATOR, and here we're just printing the query to catch it - assuming you have a dev environment setup connect to druid.

One could also probably monitor on the druid side and see what queries hit the system when you happen to click on drill-to-details.