apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.39k stars 1.26k forks source link

Add functions for statistical analysis in SQL #8493

Open jasperjiaguo opened 2 years ago

jasperjiaguo commented 2 years ago

As discussed with @siddharthteotia, consider adding some common statistical analysis methods SQL language.

Few examples:

  1. Pearson's coefficient
  2. Sampling (bernoulli/stratified)
  3. Histogram
  4. Entropy
  5. Linear regression
  6. Logistic regression
  7. SVM
jasperjiaguo commented 2 years ago

Designing the one request - multiple (sequential) queries model for statistical functions. Planning to use mini-batch stochastic gradient descent for regression algorithms 2. 3. 4.

siddharthteotia commented 2 years ago

Supporting histogram, entropy like computations could also be potentially useful

shahharshil46 commented 5 months ago

Anyone working for supporting Sampling? Do we know how much effort is it going to be ? will it be few days or weeks?

@jasperjiaguo @siddharthteotia