JuliaStats / Statistics.jl

The Statistics stdlib that ships with Julia.
https://juliastats.org/Statistics.jl/dev/
Other
68 stars 39 forks source link

Quantile for multidimensional arrays #23

Open stanbiryukov opened 4 years ago

stanbiryukov commented 4 years ago

Hi there - looks like most basic stats operations support multidimensional arrays with the exception of quantile. Is there an explicit reason for not being able to compute percentiles across dimensions?

using Random;
using Statistics;
a = rand(5,10)
median(a, dims=1)
# quantile(a, [.5], dims=1) # can we supply dims like in mean/median/var etc?
thchr commented 4 years ago

At least for sample quantiles, this seems to be more difficult to do generally: you'd have to establish an interpolating "surface" between the scattered observations in ℝᵈ or a meaningful sorting/center of observations in ℝᵈ. There isn't an unambiguous or natural sorting choice beyond d=1, so that would involve picking a convention.

Still, there seems to be several suggested approaches for multivariate sample quantiles: e.g., https://doi.org/10.1111/1467-9574.00195 and https://doi.org/10.2307/2291681 or https://arxiv.org/pdf/0805.0056.pdf (the latter basically lets multivariate quantiles mean depth contours).

I agree that the notion can have real practical use - but it may be more appropriate for a package than stdlib.

(For some simple cases, you could probably get away with just "flattening" your data into a 1D array or doing quantiles of slices along dimensions.)

stanbiryukov commented 4 years ago

I just mean a vectorized approach across specific dimensions like the other basic stats functions that accept dims as an argument. Looking at the quantile source code, it already does the linear interpolation across the vector, why not supply a dimension too as opposed to looping through quantile calls? For example, it looks like median does this with a 'mapslices' call.

aakhmetz commented 4 years ago

so, there is no way? :(

IsidoreBeautrelet commented 4 years ago

I agree, providing the option of specifying the dimension of an array to the quantile function would have real practical use! Hope to see this feature in future releases!

holomorphism commented 10 months ago

Is this issue still open? I am using the following definition for now, but I am not sure about the performance.

myquantile(A, p; dims, kwargs...) = mapslices(x->quantile(x, p; kwargs...), A; dims)

(Link: #10)

EDIT: Sorry, mapslices was already mentioned by @stanbiryukov, so I didn't have anything specific to add. It would be nice if this feature is implemented in Statistics.jl.