JuliaStats / KernelDensity.jl

Kernel density estimators for Julia
Other
178 stars 40 forks source link

Wrap around below #51

Open oxinabox opened 6 years ago

oxinabox commented 6 years ago

I noticed that at https://github.com/JuliaStats/KernelDensity.jl/blob/bf4191d3eeada071d308b058c922b2aba6aa4003/src/univariate.jl#L96-L100 if a point should be allocated to the lower boundary it is just discarded. Meaning after tabulate the bottom bin is always empty.

As I read the original Jones and Lotwick paper https://www.jstor.org/stable/pdf/2347674.pdf there solution in the equivalent case is to wrap-around. Make sense as the FFT-convolution will wrap around anyway

Now in most usage, the lower (and upper) bins should be sufficiently far from the edge, so as to avoid wrap-around effects from FFT-convolution. (I am running it to it because I am explicitly wanting wrap around effects).

Possible rather than employing wrap-around in tabulate, the alternative might be to just throw an error in an else statement added to that if Saying "Lower boundary too tight, please expand lower boundary" It seem bad to just silently throw away data

This only handled the uni-variate case, if I am right it can be extended to the bivariate case

oxinabox commented 6 years ago

Bump. This is passing on 0.6

simonbyrne commented 6 years ago

I'm broadly sympathetic, but why just wrap for the one step below? Since we're using midpoints, shouldn't it be half a step below, half a step above? Or should we just wrap all?

oxinabox commented 6 years ago

I think that actually equivalent to this it comes down to how where indexing in the first place and we have this empty bin at the bottom

oxinabox commented 6 years ago

I think I need to draw some pictures I know that my solution stops any data being thrown away and it feels like it is equivalent to Jones and Lotwick's but they index differently, so wrap above to below IIRC

oxinabox commented 6 years ago

Ok, after playing with this, and master I've concluded the wrap-around doesn't matter as much as things that are exactly equal to the lower boundary midpoint, actually end up in the bottom bin

For mps= 1:9

Master:

julia>  KernelDensity.tabulate([1.0], mps).density |> print
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

This PR

julia> KernelDensity.tabulate([1.0], mps).density |> print
[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

So it doesn't seem right that the lower boundary just gets stiffed like that. Hitting the upper boundary gets you in the upper bin. Hitting the lower boundary gets you drops. It is basically saying this is exclusive below inclusive above.

You are correct that when it comes to actual wrap-around, it only actually ends up wrapping round from below to above.

Full details

With this PR:

julia> mps = KernelDensity.kde_range((1,9), 9)
1.0:1.0:9.0

julia> KernelDensity.tabulate([1.0], mps).density |> print
[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
julia> KernelDensity.tabulate([1.5], mps).density |> print
[0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
julia> KernelDensity.tabulate([0.5], mps).density |> print
[0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5]
julia> KernelDensity.tabulate([9.0], mps).density |> print
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]

julia> KernelDensity.tabulate([9.5], mps).density |> print
ERROR: BoundsError: attempt to access 9-element StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}} at index [10]

julia>  KernelDensity.tabulate([1.01], mps).density |> print
[0.99, 0.01, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

Current master:

julia> mps = KernelDensity.kde_range((1,9), 9)
1.0:1.0:9.0

julia>  KernelDensity.tabulate([1.0], mps).density |> print
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
julia>  KernelDensity.tabulate([0.5], mps).density |> print
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
julia>  KernelDensity.tabulate([9.0], mps).density |> print
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]
julia>  KernelDensity.tabulate([1.5], mps).density |> print
[0.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
julia>  KernelDensity.tabulate([8.5], mps).density |> print
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.5]
julia>  KernelDensity.tabulate([9.5], mps).density |> print
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
julia>  KernelDensity.tabulate([1.01], mps).density |> print
[0.99, 0.01, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
simonbyrne commented 6 years ago

This could probably be handled better (maybe an optional periodic argument?), but in any case it is an improvement. Would you mind adding a test or two, and perhaps comment, and then we can merge

oxinabox commented 6 years ago

Maybe rather than wrap around I could workout how to include the lower boundary and the an error on out of bounds

matbesancon commented 6 years ago

@oxinabox you wanted to add some modifications, some tests? Maybe we can finish this one?

oxinabox commented 6 years ago

My thesis is due in 1 week. So no. Not right now.

matbesancon commented 6 years ago

no problem, just wanted to see pending PRs, good luck :)