dbt-labs / metricflow

MetricFlow allows you to define, build, and maintain metrics in code.
https://docs.getdbt.com/docs/build/about-metricflow
Other
1.12k stars 92 forks source link

[SL-1967] Add support for statistical aggregate functions #1111

Open tlento opened 4 months ago

tlento commented 4 months ago

There are a number of straightforward statistical aggregate functions which we should be able to support without too much effort, although as always we have to make some decisions.

There is a current request for var_samp and covar_samp for BigQuery, but there are others we could add to this list.

Statistical aggregate functions recommended for consideration

  1. Sample variance var_samp
  2. Sample covariance covar_samp (multi-argument, not natively supported by Redshift)
  3. Sample standard deviation stddev_samp
  4. Population variance var_pop
  5. Population covariance covar_pop (multi-argument, not natively supported by Redshift)
  6. Populate standard deviation stddev_pop
  7. Correlation coefficient: corr (multi-argument, not natively supported by Redshift)

Statistical aggregate functions NOT under consideration

  1. Kurtosis: kurtosis (not natively supported by BigQuery, Postgres, Redshift)
  2. Skewness: skewness (skew in Snowflake, not natively supported by BigQuery, Postgres, Redshift)

Native implementations are missing from too many engines to justify the effort for these, especially given how little use they're likely to see.

Overall recommendation

Start with the ones supported across all engines, as those are much more straightforward to develop and test since they are universally supported and fit into our existing aggregate function model.

Separately, evaluate whether or not to bother with custom native-sql implementations of the covariance and correlation functions for Redshift. These are also more complex because they are the first multi-input aggregate functions we would be supporting.

SL-1967

tlento commented 4 months ago

Note - this is closely related to, and possibly a pre-requisite for, #52