apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.36k stars 13.69k forks source link

TOTALS incorrect for non-additive metric in pivot table #25747

Open lameapparition opened 12 months ago

lameapparition commented 12 months ago

Hello, I'm new into Superset. Trying to move our analytics to this platform, but i met problem as I see no possibility of making Total row correct for non-countable (or non-additive) metric SuperSet version: 3.0.0

Example: I have three metrics grouped by month:

  1. "spend plan" – amount of money company was planning to spend
  2. "spend fact" – amount of money company actually spended
  3. "spend rate" – ratio of spend fact and spend plan So, I created pivot table and added totals. "spend plan" and "spend fact" are ok – its totals are sums of all values but for "spend rate" I also see sums of all values instead of ratio of "spend fact" and "spend plan" totals image image

Expected result (using Google Spreadsheets):

image

Maybe I'm missing something, hope to get help

UPD. Found that usual table calculates "spend rate" correctly

image

Originally posted by @lameapparition in https://github.com/apache/superset/issues/14061#issuecomment-1777176807

villebro commented 12 months ago

This is a known limitation right now. The solution would be to trigger additional queries for all aggregation levels. By doing so even non-additive metrics would display correctly (this is how the table chart does it). However, the required change is rather complex, and requires good knowledge of how the pivot table chart works. IIRC I've discussed this with @kgabryje a few times a while ago (he's the original author of the chart plugin), but we were never able to find the time to work on implementing this improvement.

mistercrunch commented 8 months ago

There'd be two main ways to approach this:

Both of those would be significant work and add much more complexity to the current SQL-oriented approach.

It could be good though to hide totals for non-additive metrics though, probably through per-metric inference or configuration. If it's a simple metric, we know which aggregation methods are additive and/or why. If it's SQL expression, the user may have to tell us if it's additive or not. A simple quick-fix would be to only show the totals where people use the SUM-type simple metric (which is probably 80%+ of metrics out there).

Interestingly, for things like average we could be smarter, and for things like ratio (ask the db for a SUM(col) and a COUNT(col)) and do our own math. For ratio we could introduce new "not-so-simple" metrics where you specify your numerator metric and denominator metric and we could compute totals properly....

rusackas commented 4 months ago

Hey all,

Just FYI, we have had quite a few issues on this topic. I've closed out a couple of the other ones in favor of this one. lease feel free to do they same as they arise!

felipegranado commented 3 months ago

Hi!

Any update for this? I working in a table and I found same issue about the totals.

Thanks in advance...

goldjee commented 2 months ago

Bumping this because I believe that the ability to aggregate non-additive data is crucial for any BI tool. All of the major tools present on the market can handle that effortlessly.

As we define SQL for measures (and they should be proper SQL aggregates) in dataset options, I think that Superset already has all the required information to calculate subtotals. So,

This approach has an obvious drawback of spamming queries if there is a lot of aggregation levels. But realistically no pivot table would have even a dozen of them, so such an overhead should be tolerable.

@mistercrunch @rusackas @villebro Could we expect such a feature in one of the upcoming releases?