Open jbvsmo opened 2 years ago
Wondering if @eschutho or @zhaoyongjie have any insight on this issue.
@jbvsmo I found this post which looks related to this issue: https://community.dremio.com/t/unable-to-group-by-date-trunc/7953
@rusackas I think it makes sense to investigate this further and whether it's indeed a fix that's needed on our side or Dremio, but I think as as workaround you should be able to create a calculated column on your dataset for DATE_TRUNC('day', "OperationDate")
and give it a new name and then use that new column for your dimension/group by (you can use the original column in your time dimension). When trying that change, my query now looks like this one, which matches what looks to be the fix for Dremio:
also cc @betodealmeida for any additional insight.
@eschutho Following up on this, seems like Dremio still doesn't support that syntax and they recommend using the alias on the groupby clause. I asked on that community post you showed and they responded that.
There are a few issues with calculated columns in the dataset, one being annoying to do it for several tables and another that sometimes I want to aggregate by day, month, year, quarter, week... I would have to create many columns when a simply fix of using the correct alias would be enough
@jbvsmo you can also create an alias/calculated column without the time coercion if you want it to be more flexible. Here's an example of renaming ds
to date
:
Then just use date
for your time grain and ds
for your dimension.
I hear what you're saying about it not being ideal with regard to having to do this on many tables, but it is a temporary workaround for now until Dremio can fix the bug on their end. The code solution on the Superset side around this would need to be something where we alias all time dimensions for Dremio which could get messy, but we're definitely open to any PRs if someone wants to contribute an idea.
Same issue at Apache drill, maybe need to check database driver to choose native name/ alias name.
I'm using Apache Drill and having the same problem (as @meyergin also mentioned).
Apache Drill supports GROUP BY both with and without alias. E.g. both these works:
SELECT LEFT(name,2) as name2 FROM ... GROUP BY LEFT(name,2)
- OK
SELECT LEFT(name,2) as name2 FROM ... GROUP BY name2
- OK
However, when the alias is the same as the column name, we get a different behaviour:
SELECT LEFT(name,2) as name FROM ... GROUP BY LEFT(name,2)
- FAIL - Method used by Superset
SELECT LEFT(name,2) as name FROM ... GROUP BY name
- OK
@eschutho's workaround by adding a new calculated column ts, will make superset not use the same alias as the original column. However, this is only reasonable for the time column, and not custom SQL adhoc metrics.
I'm not sure what is "proper SQL", but many engines seem to handle GROUP BY very differently.
In superset/superset/models/helpers.py make_orderby_compatible:
If needed, make sure aliases for selected columns are not used in
`ORDER BY`.
In some databases (e.g. Presto), `ORDER BY` clause is not able to
automatically pick the source column if a `SELECT` clause alias is named
the same as a source column. In this case, we update the SELECT alias to
another name to avoid the conflict.`
So, some adjustment is done for ORDER BY, and it seems like the same is needed for GROUP BY.
I'm not sure who might be best to take a look at the current state of affairs here, as I don't know of anyone Drill or Dremio volunteers to reproduce or assess.
I might consider filing the Drill issue separately, lest this issue get conflated and stuck.
CC @naren-dremio in case he knows of anyone fitting to take a look at this from the Dremio side.
Anyone still facing this, or looking to work on it? It's at risk of being closed as stale.
@rusackas: See issue #28443. The PR #28444 solves my issue with Drill. Just waiting for it to be merged ... and released.
Adding more reviewers to that PR :)
I'm using Dremio database and it throws a VALIDATION ERROR on any query that tries to group by a time field that has a granularity expression. This might be valid SQL for another database system, but not Dremio. This makes superset useless for Dremio tables since grouping by time is a very common and used feature
How to reproduce the bug
2022-06-10 12:34:56
and select time grainday
ormonth
COUNT(*)
It is not using the name of the column
OperationDate
in the group by and is rewriting the same expression.Expected results
to have this clause:
Actual results
Query execution error. Details: VALIDATION ERROR: Expression \'messages_operations_v.OperationDate\' is not being grouped
Screenshots
Environment
Checklist
Make sure to follow these steps before submitting your issue - thank you!
Additional context
Add any other context about the problem here.