Open sebastian opened 5 years ago
Note: this is the last aggregate we need to support in order to be able to completely delete the uid
-based anonymization pipeline. This will allow us to simplify and standardize the cloak internals (a worthy goal if you ask me!)
There is also the problem of min/max
over date/time
values.
A design was present in the original
no-uid approach
issue. I am unable to link directly to the section, but you get there by searching for "Standard Deviation of a column".Excerpt:
Analyst query:
Stats-based query:
Note that the average needed for the standard deviation computation is computed as a SELECT embedded in the abs function. My postgres is smart enough to store the computed value and re-use, but not every implementation might.
There is a small difference in the answers of the analyst and no-uid queries here. The analyst query returns 9470.0803, while the no-uid query returns 9470.0766. So not sure I'm really doing the computation right here.
Note that the amount of noise added here would be proportional to the max or avg divided by the number of distinct users (so max 5.1 and avg 1.5).
Further note: the SQL
with
-feature might be useful to extract the calculation of the average?