JuliaStats / Statistics.jl

The Statistics stdlib that ships with Julia.
https://juliastats.org/Statistics.jl/dev/
Other
71 stars 40 forks source link

documentation of quantile #138

Closed mcreel closed 1 year ago

mcreel commented 1 year ago

The documentation of quantile is confusing. In the help string shown when doing ?quantile, v[k] (bold, below) is not defined. Also, g (also in bold) is never used. Perhaps g should be γ?

For reference, the ?quantile help string is

help?> quantile search: quantile quantile! quantile(itr, p; sorted=false, alpha::Real=1.0, beta::Real=alpha)

Compute the quantile(s) of a collection itr at a specified probability or vector or tuple of probabilities p on the interval [0,1]. The keyword argument sorted indicates whether itr can be assumed to be sorted.

Samples quantile are defined by Q(p) = (1-γ)x[j] + γx[j+1], where x[j] is the j-th order statistic, and γ is a function of j = floor(np+ m), m = alpha + p(1 - alpha - beta) and g = n*p + m - j.

By default (alpha = beta = 1), quantiles are computed via linear interpolation between the points ((k-1)/(n-1), v[k]), for k = 1:n where n = length(itr). This corresponds to Definition 7 of Hyndman and Fan (1996), and is the same as the R and NumPy default.

I

mcreel commented 1 year ago

Thanks. This help string is not the same as the one that I reported. When doing ?quantile, one gets the string I reported. The version here is for a different method. I think that it does help to understand what v[k] is intended to mean, but the help message that I copied into my message is not clear, because there is no context for what v is. I did try to improve the formatting of the issue.


From: Moritz Schauer @.> Sent: Wednesday, January 18, 2023 13:50 To: JuliaStats/Statistics.jl @.> Cc: Michael David Creel @.>; Author @.> Subject: Re: [JuliaStats/Statistics.jl] documentation of quantile (Issue #138)

This is difficult to read. Here it is formatted:

?quantile(v, w::AbstractWeights, p)

Compute the weighted quantiles of a vector v at a specified set of probability values p, using weights given by a weight vector w (of type AbstractWeights). Weights must not be negative. The weights and data vectors must have the same length. NaN is returned if x contains any NaN values. An error is raised if w contains any NaN values.

With FrequencyWeights, the function returns the same result as quantile for a vector with repeated values. Weights must be integers.

With non FrequencyWeights, denote N the length of the vector, w the vector of weights, $$h = p (\sum_{i<= N} w_i - w_1) + w_1$$ the cumulative weight corresponding to the probability p and $$Sk = \sum{i<=k} wi$$ the cumulative weight for each observation, define $$v{k+1}$$ the smallest element of v such that $$S_{k+1}$$ is strictly superior to h. The weighted p quantile is given by $$vk + \gamma (v{k+1} - v_k)$$ with $$\gamma = (h - Sk)/(S{k+1} - S_k)$$. In particular, when all weights are equal, the function returns the same result as the unweighted quantile.

— Reply to this email directly, view it on GitHubhttps://github.com/JuliaStats/Statistics.jl/issues/138#issuecomment-1387015973, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAL3ATVYUMDNXAUZA7KAWBDWS7RIPANCNFSM6AAAAAAT64TFEU. You are receiving this because you authored the thread.Message ID: @.***>

mschauer commented 1 year ago

Thank you, also thank you for formatting, I quoted the wrong docstring by accident. So, yes indeed, there are some strange things. I think v[k] is now x[k], the order statistic and perhaps g is is γ. Maybe @lungben remembers?

mcreel commented 1 year ago

PR 139 offers a fix for this. https://github.com/JuliaStats/Statistics.jl/pull/139

nalimilan commented 1 year ago

Closed by https://github.com/JuliaStats/Statistics.jl/pull/139.