JuliaStats / KernelDensity.jl

Kernel density estimators for Julia
Other
178 stars 40 forks source link

KDE with weighted data #25

Closed axsk closed 8 years ago

axsk commented 8 years ago

I need to do a KDE over weighted data, i.e. each entry of data::RealVector should count according to weights::RealVector with sum(weights) == 1

Am I right thinking that all I need to adjust is ainc = 1.0 / (ndata*s*s) -> ainc = 1.0 / (s*s) and

grid[j] += (midpoints[k]-x)*ainc*weights[i] # where data[i] == x
grid[k] += (x-midpoints[j])*ainc*weights[i]

in univariate.jl?

axsk commented 8 years ago

I wonder what would be the best way to include this option without allocationg a whole fill(1/ndata, ndata) in the default case of even weights.

nignatiadis commented 8 years ago

@axsk I've also often wondered the same. In GLM.jl (https://github.com/JuliaStats/GLM.jl/blob/master/src/lm.jl) this is handled by allowing weights to be of length zero and then having two special cases (via isempty(wts)) in the individual functions. See also the discussion at https://github.com/JuliaStats/StatsBase.jl/issues/135.

axsk commented 8 years ago

c.f. #26