apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
61.72k stars 13.49k forks source link

`Error: Referenced columns not available in DataFrame` in Mixed Charts when one query has only nulls #29933

Open diegoquintanav opened 1 month ago

diegoquintanav commented 1 month ago

Bug description

Creating a Mixed Chart using a dataset that yields only null values for one of the queries will produce an error

Unexpected error
Error: Referenced columns not available in DataFrame.

How to reproduce the bug

Consider the following dataset

-- to test issue with two charts when one dataset has only nulls

with part_a as (
SELECT * 
FROM generate_series(1,100) as id),
part_b as (
  (
    select id, 
    random() as value,
    'type_a' as some_type,
    '2001-10-22 00:00'::timestamp + (part_a.id::text || ' hours')::interval as ts
    from part_a
  )
  union all
  (
    select id, 
    null as value,
    'type_b' as some_type,
    '2001-10-22 00:00'::timestamp + (part_a.id::text || ' hours')::interval as ts
    from part_a)
  )

select * from part_b;

which yields ts, value and some_type with 2 different types. The latter is null for one type and random for the other.

Create a Mixed Chart using this dataset, using these settings, will get an error

image image

Now, if I remove the filter in the query B that yields only null values, then it works (not correctly though, because I wanted it to be filtered)

image

image

Screenshots/recordings

No response

Superset version

3.1.3

Python version

I don't know

Node version

I don't know

Browser

Firefox

Additional context

I'm using Superset version 3.0.2.

Checklist

dosubot[bot] commented 1 month ago

Hey @diegoquintanav! I'm here to help you with the issue you're facing. I'm a bot that can assist you in solving bugs, answering questions, and becoming a contributor. Let me know how I can assist you!

To resolve the "Referenced columns not available in DataFrame" error when creating a Mixed Chart in Superset version 3.1.3, which occurs when one of the queries yields only null values, you can modify the code to handle cases where the DataFrame is empty or contains only null values. Specifically, you can add a check to ensure that the DataFrame is not empty before proceeding with operations that assume the presence of data.

Here is a code snippet that demonstrates how to handle this situation:

# pandas will throw away nulls when grouping/pivoting,
# so we substitute NULL_STRING for any nulls in the necessary columns
filled_cols = groupby + columns
df = df.copy()
df[filled_cols] = df[filled_cols].fillna(value=NULL_STRING)

# Check if the DataFrame is empty after filling nulls
if df.empty:
    raise QueryObjectValidationError(
        _("The query resulted in an empty DataFrame. Please check your query or filters.")
)

sortby = utils.get_metric_name(
    self.form_data.get("timeseries_limit_metric") or metrics[0]
)
row = df.groupby(groupby)[sortby].sum().copy()
is_asc = not self.form_data.get("order_desc")
row.sort_values(ascending=is_asc, inplace=True)
pt = df.pivot_table(index=groupby, columns=columns, values=metrics)
if self.form_data.get("contribution"):
    pt = pt.T
    pt = (pt / pt.sum()).T
pt = pt.reindex(row.index)

# Re-order the columns adhering to the metric ordering.
pt = pt[metrics]
chart_data = []
for name, ys in pt.items():
    if pt[name].dtype.kind not in "biufc" or name in groupby:
        continue
    if isinstance(name, str):
        series_title = name
    else:
        offset = 0 if len(metrics) > 1 else 1
        series_title = ", ".join([str(s) for s in name[offset:]])
    values = []
    for i, v in ys.items():
        x = i
        if isinstance(x, (tuple, list)):
            x = ", ".join([str(s) for s in x])
        else:
            x = str(x)
        values.append({"x": x, "y": v})
    chart_data.append({"key": series_title, "values": values})
return chart_data

This code snippet includes a check to raise a QueryObjectValidationError if the DataFrame is empty after filling null values, which should help prevent the "Referenced columns not available in DataFrame" error [1].

To continue talking to Dosu, mention @dosu.