cplusplus / papers

ISO/IEC JTC1 SC22 WG21 paper scheduling and management
627 stars 18 forks source link

P1708 R8 Basic Statistics #475

Open wg21bot opened 5 years ago

wg21bot commented 5 years ago

P1708R0 Simple Statistics functions (Richard Dosselmann, Michael Wong)

brycelelbach commented 5 years ago

Cologne 2019-07 LEWGI Minutes

P1708R0 Simple Statistical Functions For the Standard Library: Direction Review

Champion: Phillipp Ratzloff

Minute Taker: Vincent Reverdy

Start Overview: 07-18 10:40

Add range versions of these algorithms.

Specifying the intermediate type with a template parameter seems problematic. Instead, add a three-argument version that takes an initial value (and uses the type of that initial value as the intermediate type).

Bikeshed all the names.

Rolling algorithm versions?

median should not require pre-sorting, it can be implemented more efficiently with nth_element.

Having median return a pair of iterators is a usability issue.

Why is it useful to get the range of the median? Why do you want the iterator to the median, instead of the value?

This paper should be using ForwardIterators not InputIterators.

Options for intermediate type APIs.

// 0
template <typename T = double, typename I>
T mean(I f, I l);

// 1
template <typename I, typename T>
T mean(I f, I l, T sum = /* iterator value type */, std::size_t n = 0);

Issues to discuss/poll on:

Start Review: 10:55

Start Polling: 11:00

POLL: We should promise more committee time to pursuing simple statistical sequence algorithms in the standard library, knowing that our time is scarce and this will leave less time for other work.

NO OBJECTION TO UNANIMOUS CONSENT.

Attendance: 15

More discussion happened.

std::mean(v) std::accumulate(v) / std::distance(v)

POLL: We should promise more committee time to pursuing std::mean in the standard library, knowing that our time is scarce and this will leave less time for other work.

Strongly For Weakly For Neutral Weakly Against Strongly Against
2 3 4 5 1

Attendance: 17

More discussion happened.

More Polling: 11:47

POLL: We should promise more committee time to pursuing convenient versions of std::mode and std::median that return values not positions, require temporary storage, and do not require their input to be sorted, knowing that our time is scarce and this will leave less time for other work.

Strongly For Weakly For Neutral Weakly Against Strongly Against
0 4 7 5 0

Attendance: 17

POLL: We should promise more committee time to pursuing P1708R0, knowing that our time is scarce and this will leave less time for other work.

Strongly For Weakly For Neutral Weakly Against Strongly Against
3 6 3 4 0

Attendance: 17

End: 11:57

Referral to SG6 for numerics review.

Conor Hoekstra and Vincent Reverdy will help the author out with the next revision.

CONSENSUS: Bring a revision of P1708R0, with the guidance below, to LEWGI for further direction review.

NAThompson commented 4 years ago

Friends,

Just got a boost version of this up and running:

https://github.com/boostorg/math/pull/248

This implementation (just of the mean, for now) brings a couple things I think would be very useful: Namely, it adds C++17 parallel execution policies as well as the projections from Eric Niebler's ranges library. (I still do not think I've extracted near the full power of the ranges, but perfect is the enemy of the good, as they say.)

As to the comment that mean should be done via std::accumulate(v) / std::distance(v): I think this is not wrong, but suboptimal. See:

Robert F Ling. Comparison of several algorithms for computing sample means and variances. Journal of the American Statistical Association, 69(348): 859–866, 1974

The algorithm in Boost for the mean is also discussed by Higham in "Accuracy and Stability of Numerical Algorithms". I think it's valuable to have since we cannot expect most people to understand why it's a good idea to do this. In addition, the ideas in this algorithm extend to stable methods of computing variance, skewness, and kurtosis, as well as parallelizable, single pass bivariate statistics. See:

Janine Bennett, Ray Grout, Philippe Pébay, Diana Roe, and David Thompson. Numerically stable, single-pass, parallel statistics algorithms. In 2009 IEEE International Conference on Cluster Computing and Workshops, pages 1–8. IEEE, 2009

Once the expectation is that we deploy Bennett's algorithm, we're well beyond what we can expect an average user to do correctly, so I'd say this would be a nice addition to the standard.

jensmaurer commented 4 years ago

As a general note, this github issue tracker is not for technical discussions, but for paper management / progress tracking only. Please post your technical discussions to the appropriate reflector.

wg21bot commented 4 years ago

P1708R1 Simple Statistical Functions (Michael Wong)

wg21bot commented 4 years ago

P1708R2 Simple Statistical Functions (Michael Wong, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy)

jensmaurer commented 4 years ago

Prague pre-meeting telecon: This needs review by SG6 as well.

Cpp-Lisa commented 4 years ago

We looked at this in SG6 Monday in Prague, but without the principal author. We felt that the inclusion of the median and mode clouded the interface, but we think there's room for a class used with accumulate to collect the statistical moments, templated on the number of moments to collect.

brycelelbach commented 4 years ago

Prague 2020-02 LEWGI Minutes

P1708R2 Simple Statistical Functions: Direction Review

Chair: Billy Baker

Champion: Ryan McDougall

Minute Taker: David Olsen

Start Review: 2020-02-11 10:01

Prior art:

Volunteers to help the author revise the proposal/people to contact:

End: 10:08

CONSENSUS: Further revision is needed before LEWGI can review this.

wg21bot commented 3 years ago

P1708R3 Simple Statistical Functions (Richard Dosselman, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy)

jensmaurer commented 3 years ago

From @fraggamuffin :

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1708r3.pdf wg21.link/P1708 Passed SG19 review in Dec 2020 meeting Directing to SG6 and LEWG review.

This document proposes an extension to the C++ library, to support simple statistical functions. Such functions, not presently found in the standard (including the special math library), frequently arise in scientific and industrial, as well as general, applications. These functions do exist in Python [1], the foremost competitor to C++ in the area of machine learning, along with Calc [2], Excel [3], Julia [4], MATLAB [5], PHP [6], R [7], Rust [8], SAS [9], SPSS [10] and SQL [11]. Further need for such functions has been identified as part of SG19 (machine learning) [12]. This is not the first proposal to move statistics in C++. In 2004, a number of statistical distributions were proposed in [13]. More such distributions followed in 2006 [14]. Statistical distributions ultimately appeared in the C++11 standard [15]. Distributions, along with statistical tests, are also found in Boost [16]. A series of special mathematical functions later followed as part of the C++17 standard [17]. A C library, GNU Scientific Library [18], further includes support for statistics, special functions and histograms .

wg21bot commented 3 years ago

P1708R4 Simple Statistical Functions (Richard Dosselman, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy, Jens Maurer)

wg21bot commented 3 years ago

P1708R5 Simple Statistical Functions (Richard Dosselman, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy, Jens Maurer)

brycelelbach commented 3 years ago

SG6 will look at this first, and they can send it to Library Evolution when they feel it is ready.

wg21bot commented 2 years ago

P1708R6 Simple Statistical Functions (Richard Dosselman, Micheal Chiu, Richard Dosselmann, Eric Niebler, Phillip Ratzlof, Vincent Reverdy, Jens Maurer)

mattkretz commented 2 years ago

POLL: Any objection to unanimous consent to forward a new revision of P1708R6 containing the discussed changes to LEWG?

No objections to unanimous consent.

# of Authors: 2 # of Participants: 6

Design questions raised in SG6 which could be of interest to LEWG:

wxinix-2022 commented 1 year ago

Any sample implementation for P1708 so far?

wg21bot commented 1 year ago

P1708R7 Basic Statistics (Richard Dosselmann)

wg21bot commented 8 months ago

P1708R8 Basic Statistics (Richard Dosselmann)

ben-craig commented 5 months ago

2024-03-20 Library Evolution Tokyo

P1708R8: Basic Statistics

2024-03-20 Library Evolution Tokyo Minutes

Champion: Richard Dosselmann Chair: Ben Craig Minute Taker: Steve Downey

Summary

POLL: Facilities to compute basic statistics (mean, stddev, etc) belong in the standard library

SF WF N WA SA
14 4 5 0 1

Attendance: 20

# of Authors: 1

Author Position: SF

Outcome: Consensus

Comments:

Next Steps

More LEWG review

inbal2l commented 6 days ago

As this is a large library implementation experience will be helpful. Waiting for input from the authors on this topic.