`Error: Referenced columns not available in DataFrame` in Mixed Charts when one query has only nulls

apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform

Apache License 2.0

61.72k stars 13.49k forks source link

Bug description

Creating a Mixed Chart using a dataset that yields only null values for one of the queries will produce an error

Unexpected error
Error: Referenced columns not available in DataFrame.

How to reproduce the bug

Consider the following dataset

-- to test issue with two charts when one dataset has only nulls

with part_a as (
SELECT * 
FROM generate_series(1,100) as id),
part_b as (
  (
    select id, 
    random() as value,
    'type_a' as some_type,
    '2001-10-22 00:00'::timestamp + (part_a.id::text || ' hours')::interval as ts
    from part_a
  )
  union all
  (
    select id, 
    null as value,
    'type_b' as some_type,
    '2001-10-22 00:00'::timestamp + (part_a.id::text || ' hours')::interval as ts
    from part_a)
  )

select * from part_b;

which yields ts, value and some_type with 2 different types. The latter is null for one type and random for the other.

Create a Mixed Chart using this dataset, using these settings, will get an error

Now, if I remove the filter in the query B that yields only null values, then it works (not correctly though, because I wanted it to be filtered)

Screenshots/recordings

No response

Superset version

3.1.3

Python version

I don't know

Node version

I don't know

Browser

Firefox

Additional context

I'm using Superset version 3.0.2.

Checklist

[X] I have searched Superset docs and Slack and didn't find a solution to my problem.
[X] I have searched the GitHub issue tracker and didn't find a similar bug report.
[ ] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

# pandas will throw away nulls when grouping/pivoting, # so we substitute NULL_STRING for any nulls in the necessary columns filled_cols = groupby + columns df = df.copy() df[filled_cols] = df[filled_cols].fillna(value=NULL_STRING) # Check if the DataFrame is empty after filling nulls if df.empty: raise QueryObjectValidationError( _("The query resulted in an empty DataFrame. Please check your query or filters.") ) sortby = utils.get_metric_name( self.form_data.get("timeseries_limit_metric") or metrics[0] ) row = df.groupby(groupby)[sortby].sum().copy() is_asc = not self.form_data.get("order_desc") row.sort_values(ascending=is_asc, inplace=True) pt = df.pivot_table(index=groupby, columns=columns, values=metrics) if self.form_data.get("contribution"): pt = pt.T pt = (pt / pt.sum()).T pt = pt.reindex(row.index) # Re-order the columns adhering to the metric ordering. pt = pt[metrics] chart_data = [] for name, ys in pt.items(): if pt[name].dtype.kind not in "biufc" or name in groupby: continue if isinstance(name, str): series_title = name else: offset = 0 if len(metrics) > 1 else 1 series_title = ", ".join([str(s) for s in name[offset:]]) values = [] for i, v in ys.items(): x = i if isinstance(x, (tuple, list)): x = ", ".join([str(s) for s in x]) else: x = str(x) values.append({"x": x, "y": v}) chart_data.append({"key": series_title, "values": values}) return chart_data

apache / superset