evidence-dev / evidence

Business intelligence as code: build fast, interactive data visualizations in SQL and markdown
https://evidence.dev
MIT License
4.51k stars 218 forks source link

[Bug]: weightedMean aggregation of percent change in DataTables doesn't handle nulls and zeros correctly #2852

Open nickfanion opened 5 days ago

nickfanion commented 5 days ago

Describe the bug

When using totalAgg=weightedMean in DataTables to aggregate percent change, null values and zeroes in the weightCol cause the entire row to be ignored, which leads to incorrect aggregates.

Steps to Reproduce

```sql example_data
select 'Peanut Butter' AS category, 'Chunky Peanut Butter' AS item, 6678 AS sales, 1189 AS sales_ya
union
select 'Peanut Butter', 'Smooth Peanut Butter', 7155, 7651
union
select 'Peanut Butter', 'Powder Peanut Butter', 522, 9928
union
select 'Salad Dressing', 'Thousand Island Dressing', 3210, 0
union
select 'Salad Dressing', 'Ranch Dressing', 9493, 7557
union
select 'Salad Dressing', 'Italian Dressing', 9212, 8051
union
select 'Salad Dressing', 'Caeser Dressing', 2489, 9643
union
select 'Plant-Based Milk', 'Almond Milk', 2867, 5212
union
select 'Plant-Based Milk', 'Oat Milk', 7982, 2430
union
select 'Plant-Based Milk', 'Soy Milk', 9069, 5872
select
  category,
  item,
  sum(sales) as sales,
  sum(sales_ya) as sales_ya,
  ((sum(sales) - sum(sales_ya)) / sum(sales_ya)) as sales_yoy
from ${example_data}
group by all
order by sales DESC NULLS LAST
select
  ((sum(sales) - sum(sales_ya)) / sum(sales_ya)) as sales_yoy
from ${table_data}
group by all

The 'Thousand Island Dressing' row has zero sales_ya. Because it's zero, the weightedMean aggregation ignores the row, despite there being being a value in sales.

### Logs

```Shell
N/A

System Info

N/A

Severity

serious, but I can work around it

Additional Information, or Workarounds

This may be an inherent limitation to weightedMean, but percent change is a common calculation, so I think there should be a way of aggregating it in all cases. I can use a custom aggregation value as a workaround, but then that breaks groupBy aggregates in a DataTable.

Another workaround is to create a weightCol with the zeros set to a very small number like 0.001.

nickfanion commented 5 days ago

Seems to be the reverse issue of #2081 but I understand why handling of zeros/nulls was changed.