JuliaStats / KernelDensity.jl

Kernel density estimators for Julia
Other
178 stars 40 forks source link

Weighted KDE #26

Closed axsk closed 7 years ago

axsk commented 8 years ago

In a project of mine I need a weighted KDE. I achieved this by weighting the increase in the tabulate method.

Unfortunately in the current version I need to allocate an array for the weights, which is not necessary with uniform weights. Any suggestions on how to tackle this?

ararslan commented 8 years ago

Since this package uses StatsBase, I'd recommend setting up the weighting using WeightVecs from StatsBase, or at least providing another method that uses them. Also, if you're adding weighted methods for univariate, I'd recommend adding similar functionality to bivariate.

nalimilan commented 8 years ago

To avoid allocating a vector of ones for unit weights, we could use a Ones/UnitWeights type as described at https://github.com/JuliaStats/StatsBase.jl/issues/135.

EDIT: you could also add UniformWeights since you appear to need them.

nalimilan commented 8 years ago

Could you also add tests?

lstagner commented 7 years ago

@axsk Are you planning to finish this PR? I also need to to calculate a weighted KDE.

axsk commented 7 years ago

@lstagner Hey, I would like to finish this PR, but I still don't know how to handle the uniform weights properly. I could work around the uniform array allocation by creating a new UniformWeight type, doing the job (but I think this belongs rather to StatsBase, maybe the next step :)) Edit: done

@ararslan I agree that having a bivariate weighted KDE would be nice, but I would like to leave that for someone else so far.. Edit: well, if @lstagner said it wasn't that difficulat I had to try it, went fast :)

@nalimilan How do you imagine the tests, I can't think of a simple example I can calculate in my head, so basically I could just test runnability.

lstagner commented 7 years ago

@axsk I got a little impatient and implemented it myself. It wasn't super difficult. I also did the bivariate case.

Heres the code https://github.com/lstagner/KernelDensity.jl/commit/b48fb2d309ef159b3f05209a074bfabc2138f797

axsk commented 7 years ago

So how do we proceed? :)

While I like using StatsBase.WeightsVec internally, I think the API should still allow plain Vectors. I will push my suggestion :)

axsk commented 7 years ago

I think its all done, but the tests. Any recommendations on those?

lstagner commented 7 years ago

Bump

lstagner commented 7 years ago

I still want this to be included. The only thing holding it back are some tests. @axsk perhaps you should just mimic the current tests to get started.

chriselrod commented 7 years ago

Also, how about letting weights be negative?

That would be useful when dealing with SparseGrids, for example.

ararslan commented 7 years ago

Negative weights, if they are currently possible, will not be possible soon in StatsBase.

chriselrod commented 7 years ago

That's fine, and less interesting than just adding weights in the first place.

A simple ad hoc means of implementing negative weights is to just divide your data into positive and negative weights, create two KDEs, and then use the sum of the respective weights to take a weighted difference of the pdf evaluations. Not saying this should be included in the package, just that if anyone really wants to make plots or something, it's easy enough to work around.

Is adding tests to improve coverage all that remains to be done here?

axsk commented 7 years ago

Tests are done, coverage improved, should I squash it all to one?

lstagner commented 7 years ago

Now that the tests have been added and all the checks have passed can this please get merged? I use this functionality daily and it would be great if didn't have to use a different branch.

lstagner commented 7 years ago

Bump @nalimilan @simonbyrne

lstagner commented 7 years ago

:tada: Happy Anniversary!!! :tada:

This PR has been open for 1 whole year!!! This is a major accomplishment. It has really beaten the odds. Honestly, I thought it was finished well over a month ago but I was proven wrong. Here's to another year. :clinking_glasses:

simonbyrne commented 7 years ago

Thanks @axsk

remoore commented 7 years ago

I'm trying to make use of the of the weighted KDE functionality described above. I can't seem to get it to work despite having run the following code, which is cut and paste from test/univariate.jl:

using Distributions
using KernelDensity

import KernelDensity: kde_range

r = kde_range((-2.0,2.0), 128)
X = [0.0]   
D = Normal
k6 = kde(X,r;kernel=D, weights=ones(X)/length(X))
k1 = kde([0.0, 1.], r, bandwidth=1, weights=[0,1])

I'm getting the following error message:

MethodError: no method matching kde(::Array{Float64,1}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}; kernel=Distributions.Normal, weights=[1.0])
Closest candidates are:
  kde(::AbstractArray{T,1} where T<:Real, ::Range; bandwidth, kernel) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:136 got unsupported keyword argument "weights"
  kde(::AbstractArray{T,1} where T<:Real, ::Range, ::Distributions.Distribution{Distributions.Univariate,S} where S<:Distributions.ValueSupport) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:123 got unsupported keyword arguments "kernel", "weights"
  kde(::AbstractArray{T,1} where T<:Real; bandwidth, kernel, npoints, boundary) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:143 got unsupported keyword argument "weights"
  ...

Stacktrace:
 [1] (::KernelDensity.#kw##kde)(::Array{Any,1}, ::KernelDensity.#kde, ::Array{Float64,1}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}) at ./<missing>:0
 [2] include_string(::String, ::String) at ./loading.jl:515

Any ideas why this isn't working?

I've only just started using Julia, so apologies in advance if this is a stupid question

lstagner commented 7 years ago

The weights keyword accepts a Weights (formally WeightsVec) type from StatsBase

The following works.

using Distributions
using KernelDensity

import KernelDensity: kde_range
import StatsBase: Weights

r = kde_range((-2.0,2.0), 128)
X = [0.0]   
D = Normal
k6 = kde(X,r;kernel=D, weights=Weights(ones(X)/length(X)))
k1 = kde([0.0, 1.], r, bandwidth=1, weights=Weights([0,1]))
remoore commented 7 years ago

Thanks for the help, but it's still not behaving:

`MethodError: no method matching kde(::Array{Float64,1}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}; kernel=Distributions.Normal, weights=[1.0]) Closest candidates are: kde(::AbstractArray{T,1} where T<:Real, ::Range; bandwidth, kernel) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:136 got unsupported keyword argument "weights" kde(::AbstractArray{T,1} where T<:Real, ::Range, ::Distributions.Distribution{Distributions.Univariate,S} where S<:Distributions.ValueSupport) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:123 got unsupported keyword arguments "kernel", "weights" kde(::AbstractArray{T,1} where T<:Real; bandwidth, kernel, npoints, boundary) at /Users/remoore/.julia/v0.6/KernelDensity/src/univariate.jl:143 got unsupported keyword argument "weights" ...

Stacktrace: [1] (::KernelDensity.#kw##kde)(::Array{Any,1}, ::KernelDensity.#kde, ::Array{Float64,1}, ::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}) at ./:0 [2] include_string(::String, ::String) at ./loading.jl:515`

lstagner commented 7 years ago

Well it works on 0.5.1 (I haven't updated to 0.6 yet)

Edit: Actually I forgot I'm using my own version of this functionality.

Edit^2: Your example works on 0.5.1 using this branch (without using Weights as is needed on my branch).

lstagner commented 7 years ago

Also it could be that master hasn't been tagged yet. So try doing Pkg.checkout("KernelDensity") to the latest version.

ararslan commented 7 years ago

Yes, there have been no tags since this was merged

remoore commented 7 years ago

I've just checked out the latest version and it works on Julia 0.6.0.

Thanks for the help

axsk commented 6 years ago

Could we tag this version?

ararslan commented 6 years ago

https://github.com/JuliaLang/METADATA.jl/pull/11541