JuliaStats / KernelDensity.jl

Kernel density estimators for Julia
Other
177 stars 40 forks source link

pdf of new data points? #9

Closed quinnj closed 9 years ago

quinnj commented 9 years ago

Forgive me if I'm being naive, but once I run a univariate or bivariate kde, how would I go about getting the pdf of a new data point?

johnmyleswhite commented 9 years ago

At least the way the old KDE function worked, this is tricky and would need a new computation from scratch to achieve -- unless you were willing to accept an interpolation from the fitted grid of points.

simonbyrne commented 9 years ago

I think interpolation is the way to go here (it's what R does as well). We should be able to use Grid.jl for this.

simonbyrne commented 9 years ago

I've just been playing around with this. It's fairly easy to implement, but the interface requires some thought. Here's a couple of options:

  1. XXvariateKDE objects stay the same. We define CoordInterpGrid(::XXvariateKDE,...) methods that construct an interpolation object. Users have to work with both.
  2. We include the CoordInterpGrid object inside the XXvariateKDE objects, and instantiate them at construction. The interface can either be: a. overload getindex, e.g. k[1.1] b. overload call, e.g. k(1.1) c. overload pdf, e.g. pdf(k, 1.1)

Thoughts/preferences?

@dcjones What would be convenient for Gadfly?

quinnj commented 9 years ago

I think my vote would be pdf(k, 1.1). It seems the most clear to me.

johnmyleswhite commented 9 years ago

Are there options for how Grid does interpolation? If so, I'd say that it would be easier to do interpolate(kde(x)) than to autogenerate an interpolation object -- because then all of the interpolation options need to be re-exposed by kde.

That said, pdf(k, 1.1) is really nice.

simonbyrne commented 9 years ago

There are two options that need to be set for Grid: 1) The boundary condition (what to do outside the area). I guess the standard default here should be to use zero. 2) The interpolation method (nearest, linear, quadratic, cubic). We need to investigate more, but quadratic is probably a useful default.

simonbyrne commented 9 years ago

The other issue is overhead: for most plotting cases, you don't need to construct the interpolation object, and so allocating the extra array for the InterpGrid object is unnecessary.

simonbyrne commented 9 years ago

We could create another type InterpKDE, which contains both a XXvariateKDE and a CoordInterpGrid object, and could be created via an interpkde method?

johnmyleswhite commented 9 years ago

I guess all of this makes think we should not generate an interpolation by default, but perhaps we can do that automatically if you call pdf on a KDE object?

simonbyrne commented 9 years ago

Okay, I've added this functionality. Let me know what you think of the interface. I'll give it a day or two, and then I'll push a new release (this may annoy some Gadfly users...)

quinnj commented 9 years ago

Thanks! Looks great.