apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.58k stars 13.79k forks source link

Time granularity generates invalid query for Dremio #20349

Open jbvsmo opened 2 years ago

jbvsmo commented 2 years ago

I'm using Dremio database and it throws a VALIDATION ERROR on any query that tries to group by a time field that has a granularity expression. This might be valid SQL for another database system, but not Dremio. This makes superset useless for Dremio tables since grouping by time is a very common and used feature

How to reproduce the bug

  1. Create a new table chart and pick a timestamp field like 2022-06-10 12:34:56 and select time grain day or month
  2. Select as dimension the same timestamp field
  3. Add any metric such as COUNT(*)
  4. Superset will generate this query:
    SELECT DATE_TRUNC('day', "OperationDate") AS "OperationDate",
       COUNT(*) AS "count"
    FROM "no-partitions-queries"."messages_operations_v"
    GROUP BY DATE_TRUNC('day', "OperationDate")
    ORDER BY "count" DESC
    LIMIT 10000;

    It is not using the name of the column OperationDate in the group by and is rewriting the same expression.

Expected results

to have this clause:

GROUP BY "OperationDate"

Actual results

Query execution error. Details: VALIDATION ERROR: Expression \'messages_operations_v.OperationDate\' is not being grouped

Unexpected error Error: ('HY000', '[HY000] [Dremio][Connector] (1040) Dremio failed to execute the query: SELECT DATE_TRUNC(\'day\', "OperationDate") AS "OperationDate",\n COUNT(*) AS "count"\nFROM "no-partitions-queries"."messages_operations_v"\nGROUP BY DATE_TRUNC(\'day\', "OperationDate")\nORDER BY "count" DESC\nLIMIT 10000\n[30038]Query execution error. Details:[ \nVALIDATION ERROR: Expression \'messages_operationsv.OperationDate\' is not being grouped\n\nSQL Query SELECT DATE...[see log] (1040) (SQLExecDirectW)')

Screenshots

Screen Shot 2022-06-10 at 10 50 01

Environment

Checklist

Make sure to follow these steps before submitting your issue - thank you!

Additional context

Add any other context about the problem here.

rusackas commented 2 years ago

Wondering if @eschutho or @zhaoyongjie have any insight on this issue.

eschutho commented 2 years ago

@jbvsmo I found this post which looks related to this issue: https://community.dremio.com/t/unable-to-group-by-date-trunc/7953

@rusackas I think it makes sense to investigate this further and whether it's indeed a fix that's needed on our side or Dremio, but I think as as workaround you should be able to create a calculated column on your dataset for DATE_TRUNC('day', "OperationDate") and give it a new name and then use that new column for your dimension/group by (you can use the original column in your time dimension). When trying that change, my query now looks like this one, which matches what looks to be the fix for Dremio:

_DEV__Explore_-_Untitled_Query_1_06_14_2022_14_17_27

also cc @betodealmeida for any additional insight.

jbvsmo commented 2 years ago

@eschutho Following up on this, seems like Dremio still doesn't support that syntax and they recommend using the alias on the groupby clause. I asked on that community post you showed and they responded that.

There are a few issues with calculated columns in the dataset, one being annoying to do it for several tables and another that sometimes I want to aggregate by day, month, year, quarter, week... I would have to create many columns when a simply fix of using the correct alias would be enough

eschutho commented 2 years ago

@jbvsmo you can also create an alias/calculated column without the time coercion if you want it to be more flexible. Here's an example of renaming ds to date: _DEV__Superset Then just use date for your time grain and ds for your dimension.

I hear what you're saying about it not being ideal with regard to having to do this on many tables, but it is a temporary workaround for now until Dremio can fix the bug on their end. The code solution on the Superset side around this would need to be something where we alias all time dimensions for Dremio which could get messy, but we're definitely open to any PRs if someone wants to contribute an idea.

meyergin commented 2 years ago

Same issue at Apache drill, maybe need to check database driver to choose native name/ alias name.

mbrannstrom commented 11 months ago

I'm using Apache Drill and having the same problem (as @meyergin also mentioned).

Apache Drill supports GROUP BY both with and without alias. E.g. both these works:

SELECT LEFT(name,2) as name2 FROM ... GROUP BY LEFT(name,2) - OK SELECT LEFT(name,2) as name2 FROM ... GROUP BY name2 - OK

However, when the alias is the same as the column name, we get a different behaviour: SELECT LEFT(name,2) as name FROM ... GROUP BY LEFT(name,2) - FAIL - Method used by Superset SELECT LEFT(name,2) as name FROM ... GROUP BY name - OK

@eschutho's workaround by adding a new calculated column ts, will make superset not use the same alias as the original column. However, this is only reasonable for the time column, and not custom SQL adhoc metrics.

I'm not sure what is "proper SQL", but many engines seem to handle GROUP BY very differently.

In superset/superset/models/helpers.py make_orderby_compatible:

    If needed, make sure aliases for selected columns are not used in
    `ORDER BY`.

    In some databases (e.g. Presto), `ORDER BY` clause is not able to
    automatically pick the source column if a `SELECT` clause alias is named
    the same as a source column. In this case, we update the SELECT alias to
    another name to avoid the conflict.`

So, some adjustment is done for ORDER BY, and it seems like the same is needed for GROUP BY.

rusackas commented 9 months ago

I'm not sure who might be best to take a look at the current state of affairs here, as I don't know of anyone Drill or Dremio volunteers to reproduce or assess.

I might consider filing the Drill issue separately, lest this issue get conflated and stuck.

CC @naren-dremio in case he knows of anyone fitting to take a look at this from the Dremio side.

rusackas commented 5 months ago

Anyone still facing this, or looking to work on it? It's at risk of being closed as stale.

mbrannstrom commented 5 months ago

@rusackas: See issue #28443. The PR #28444 solves my issue with Drill. Just waiting for it to be merged ... and released.

rusackas commented 5 months ago

Adding more reviewers to that PR :)