Open paulobitar opened 8 months ago
Anyone still facing this? I don't know of anyone using Impala, so this is hard to troubleshoot, and is at risk of being closed as stale. Perhaps before that happens, @dosu-bot can help a bit.
Hey, @rusackas! 👋 I'm here to help you out while we wait for a human maintainer. I'm a bot that can assist you with bugs, questions, and becoming a contributor.
To resolve the issue where the generated SQL query for an Impala data source does not include the field name in the GROUP BY
clause when truncating dates, you need to ensure that the field name is included in the GROUP BY
clause. This can be done by modifying the query generation logic to include the field name explicitly.
Here is the relevant part of the code that constructs the GROUP BY
clause:
if groupby_all_columns:
qry = qry.group_by(*groupby_all_columns.values())
To ensure that the field name is included, you need to make sure that groupby_all_columns
is populated correctly. This is done in the following section:
if need_groupby:
# dedup columns while preserving order
columns = groupby or columns
for selected in columns:
if isinstance(selected, str):
# if groupby field/expr equals granularity field/expr
if selected == granularity:
table_col = columns_by_name[selected]
outer = table_col.get_timestamp_expression(
time_grain=time_grain,
label=selected,
template_processor=template_processor,
)
# if groupby field equals a selected column
elif selected in columns_by_name:
outer = self.convert_tbl_column_to_sqla_col(
columns_by_name[selected],
template_processor=template_processor,
)
else:
selected = validate_adhoc_subquery(
selected,
self.database_id,
self.schema,
)
outer = literal_column(f"({selected})")
outer = self.make_sqla_column_compatible(outer, selected)
else:
outer = self.adhoc_column_to_sqla(
col=selected, template_processor=template_processor
)
groupby_all_columns[outer.name] = outer
if (
is_timeseries and not series_column_labels
) or outer.name in series_column_labels:
groupby_series_columns[outer.name] = outer
select_exprs.append(outer)
Ensure that the outer
variable, which represents the column to be included in the GROUP BY
clause, is correctly assigned and added to the groupby_all_columns
dictionary. This will ensure that the field name is included in the GROUP BY
clause, thus resolving the AnalysisException
[1].
Additionally, you can customize the SQL query generation using the SQL_QUERY_MUTATOR
configuration option in the superset/config.py
file. This function allows you to modify the SQL query string before it is executed:
def SQL_QUERY_MUTATOR(sql: str, **kwargs: Any) -> str:
return sql
By implementing this function, you can ensure that the GROUP BY
clause includes the necessary field names based on your specific requirements [2].
To continue talking to Dosu, mention @dosu.
Bug description
When I try to create a time Graphs, I get an erro message "AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): trunc(field_name, 'MONTH'). Iḿ connecting to Impala. I Got the SQL generated and the problem is that Superset doesn´t include the field name.
This is the Query:
When I tried to run this query on impala, I saw that the correct would be:
How to reproduce the bug
Create a time graph (any kind, it just can´t be the legacy one) Choose a Impala dataset in the x-axis, select a datetime field Select a metric. Update the chart.
inspect the query.
Go to sql labs and change the query, including the name of the field used in trunc.
Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.9
Node version
16
Browser
Chrome
Additional context
No response
Checklist