Open wg21bot opened 5 years ago
P1708R0 Simple Statistical Functions For the Standard Library: Direction Review
Champion: Phillipp Ratzloff
Minute Taker: Vincent Reverdy
Start Overview: 07-18 10:40
Add range versions of these algorithms.
Specifying the intermediate type with a template parameter seems problematic. Instead, add a three-argument version that takes an initial value (and uses the type of that initial value as the intermediate type).
Bikeshed all the names.
Rolling algorithm versions?
median
should not require pre-sorting, it can be implemented more efficiently with nth_element
.
Having median
return a pair of iterators is a usability issue.
Why is it useful to get the range of the median? Why do you want the iterator to the median, instead of the value?
This paper should be using ForwardIterator
s not InputIterator
s.
Options for intermediate type APIs.
// 0
template <typename T = double, typename I>
T mean(I f, I l);
// 1
template <typename I, typename T>
T mean(I f, I l, T sum = /* iterator value type */, std::size_t n = 0);
Issues to discuss/poll on:
Start Review: 10:55
Start Polling: 11:00
POLL: We should promise more committee time to pursuing simple statistical sequence algorithms in the standard library, knowing that our time is scarce and this will leave less time for other work.
NO OBJECTION TO UNANIMOUS CONSENT.
Attendance: 15
More discussion happened.
std::mean(v)
std::accumulate(v) / std::distance(v)
POLL: We should promise more committee time to pursuing std::mean
in the standard library, knowing that our time is scarce and this will leave less time for other work.
Strongly For | Weakly For | Neutral | Weakly Against | Strongly Against |
---|---|---|---|---|
2 | 3 | 4 | 5 | 1 |
Attendance: 17
More discussion happened.
More Polling: 11:47
POLL: We should promise more committee time to pursuing convenient versions of std::mode
and std::median
that return values not positions, require temporary storage, and do not require their input to be sorted, knowing that our time is scarce and this will leave less time for other work.
Strongly For | Weakly For | Neutral | Weakly Against | Strongly Against |
---|---|---|---|---|
0 | 4 | 7 | 5 | 0 |
Attendance: 17
POLL: We should promise more committee time to pursuing P1708R0, knowing that our time is scarce and this will leave less time for other work.
Strongly For | Weakly For | Neutral | Weakly Against | Strongly Against |
---|---|---|---|---|
3 | 6 | 3 | 4 | 0 |
Attendance: 17
End: 11:57
Referral to SG6 for numerics review.
Conor Hoekstra and Vincent Reverdy will help the author out with the next revision.
CONSENSUS: Bring a revision of P1708R0, with the guidance below, to LEWGI for further direction review.
std::mode
and std::median
that return values not positions, require temporary storage, and do not require their input to be sorted, knowing that our time is scarce and this will leave less time for other work.Friends,
Just got a boost version of this up and running:
https://github.com/boostorg/math/pull/248
This implementation (just of the mean, for now) brings a couple things I think would be very useful: Namely, it adds C++17 parallel execution policies as well as the projections from Eric Niebler's ranges library. (I still do not think I've extracted near the full power of the ranges, but perfect is the enemy of the good, as they say.)
As to the comment that mean
should be done via std::accumulate(v) / std::distance(v)
: I think this is not wrong, but suboptimal. See:
Robert F Ling. Comparison of several algorithms for computing sample means and variances. Journal of the American Statistical Association, 69(348): 859–866, 1974
The algorithm in Boost for the mean is also discussed by Higham in "Accuracy and Stability of Numerical Algorithms". I think it's valuable to have since we cannot expect most people to understand why it's a good idea to do this. In addition, the ideas in this algorithm extend to stable methods of computing variance, skewness, and kurtosis, as well as parallelizable, single pass bivariate statistics. See:
Janine Bennett, Ray Grout, Philippe Pébay, Diana Roe, and David Thompson. Numerically stable, single-pass, parallel statistics algorithms. In 2009 IEEE International Conference on Cluster Computing and Workshops, pages 1–8. IEEE, 2009
Once the expectation is that we deploy Bennett's algorithm, we're well beyond what we can expect an average user to do correctly, so I'd say this would be a nice addition to the standard.
As a general note, this github issue tracker is not for technical discussions, but for paper management / progress tracking only. Please post your technical discussions to the appropriate reflector.
P1708R2 Simple Statistical Functions (Michael Wong, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy)
Prague pre-meeting telecon: This needs review by SG6 as well.
We looked at this in SG6 Monday in Prague, but without the principal author. We felt that the inclusion of the median and mode clouded the interface, but we think there's room for a class used with accumulate to collect the statistical moments, templated on the number of moments to collect.
P1708R2 Simple Statistical Functions: Direction Review
Chair: Billy Baker
Champion: Ryan McDougall
Minute Taker: David Olsen
Start Review: 2020-02-11 10:01
Prior art:
Volunteers to help the author revise the proposal/people to contact:
End: 10:08
CONSENSUS: Further revision is needed before LEWGI can review this.
P1708R3 Simple Statistical Functions (Richard Dosselman, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy)
From @fraggamuffin :
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1708r3.pdf wg21.link/P1708 Passed SG19 review in Dec 2020 meeting Directing to SG6 and LEWG review.
This document proposes an extension to the C++ library, to support simple statistical functions. Such functions, not presently found in the standard (including the special math library), frequently arise in scientific and industrial, as well as general, applications. These functions do exist in Python [1], the foremost competitor to C++ in the area of machine learning, along with Calc [2], Excel [3], Julia [4], MATLAB [5], PHP [6], R [7], Rust [8], SAS [9], SPSS [10] and SQL [11]. Further need for such functions has been identified as part of SG19 (machine learning) [12]. This is not the first proposal to move statistics in C++. In 2004, a number of statistical distributions were proposed in [13]. More such distributions followed in 2006 [14]. Statistical distributions ultimately appeared in the C++11 standard [15]. Distributions, along with statistical tests, are also found in Boost [16]. A series of special mathematical functions later followed as part of the C++17 standard [17]. A C library, GNU Scientific Library [18], further includes support for statistics, special functions and histograms .
P1708R4 Simple Statistical Functions (Richard Dosselman, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy, Jens Maurer)
P1708R5 Simple Statistical Functions (Richard Dosselman, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy, Jens Maurer)
SG6 will look at this first, and they can send it to Library Evolution when they feel it is ready.
P1708R6 Simple Statistical Functions (Richard Dosselman, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy, Jens Maurer)
POLL: Any objection to unanimous consent to forward a new revision of P1708R6 containing the discussed changes to LEWG?
No objections to unanimous consent.
# of Authors: 2 # of Participants: 6
Design questions raised in SG6 which could be of interest to LEWG:
Any sample implementation for P1708 so far?
P1708R8: Basic Statistics
2024-03-20 Library Evolution Tokyo Minutes
Champion: Richard Dosselmann Chair: Ben Craig Minute Taker: Steve Downey
POLL: Facilities to compute basic statistics (mean, stddev, etc) belong in the standard library
SF | WF | N | WA | SA |
---|---|---|---|---|
14 | 4 | 5 | 0 | 1 |
Attendance: 20
# of Authors: 1
Author Position: SF
Outcome: Consensus
Comments:
More LEWG review
As this is a large library implementation experience will be helpful. Waiting for input from the authors on this topic.
P1708R0 Simple Statistics functions (Richard Dosselmann, Michael Wong)