JuliaStats / KernelDensity.jl

Kernel density estimators for Julia
Other
177 stars 40 forks source link

Density is 0.0 for y range of z:z #3

Closed bjarkehs closed 9 years ago

bjarkehs commented 10 years ago

Hey, I'm not sure if this is a bug or intended behavior, but given the following code:

x = [rand(1.0:10000.0) for i = 1:10000]
y = [rand(1.0:10.0) for i = 1:10000]
test = kde((x,y), (1:10000,10:10))
test.density

It will return a 10000x1 Array{Float64,2} with all values being 0.0, seen by running (which will give 0.0 as well):

sum(test.density)

However, if you change the range so it is not a z:z value. For instance:

x = [rand(1.0:10000.0) for i = 1:10000]
y = [rand(1.0:10.0) for i = 1:10000]
test = kde((x,y), (1:10000,9:10))
test.density

Then everything works as intended and gives you a 10000x2 Array{Float64,2} filled with density values.

simonbyrne commented 10 years ago

I'm curious: what would you expect the result of the first one to be?

bjarkehs commented 10 years ago

I would expect it to give me the density for that single value across the other range.

simonbyrne commented 10 years ago

ah, i see: it's because we're missing the edge cases.

I guess the bigger point here is that you need to specify your range such that there is enough space between the extrema of your data and the edges of the range, in order to avoid the "wrap-around" effect from the convolution.

bjarkehs commented 10 years ago

I see.

As a side note I may be using kde incorrectly, or not quite understanding where to input what, but let's continue from my previous example.

If I have the following code:

x = [20.0]
y = [6.1]
test = kde((x,y))

I will get BivariateKDE which looks somewhat like this: BivariateKDE{FloatRange{Float64},FloatRange{Float64}}(16.4:0.05669291338582679:23.6,2.4999999999999996:0.056692913385826764:9.7,128x128 Array{Float64,2}

I can get the densities of defined points as such:

x = [20.0]
y = [6.1]
test = kde((x,y), (19:22,6.01:7.81))

However, if I change the ranges slightly (to something higher than my single y-value) then I get 0.0 densities:

x = [20.0]
y = [6.1]
test = kde((x,y), (19:22,6.23:7.81))

This seems strange since, the first example provides densities for values that should be in range of the last example.

simonbyrne commented 10 years ago

Ah, I see. The KDE is simply represented as a grid of density values at particular points. However it can only incorporate observations that fall inside the grid: any on the outside are discarded. The range argument is simply used to construct this grid.

What you want to do is interpolate the KDE between grid points: unfortunately we can't do that yet (hey, it is only version 0.0.2!), but it is on the TODO list. If you want to do this yourself, look at the https://github.com/timholy/Grid.jl library.

simonbyrne commented 9 years ago

I've now implemented interpolation (b2c9f7d6ba2d8c581f0390b07bb661841c031635), so hopefully that solves your problem. If not, please let me know.