JuliaRobotics / KernelDensityEstimate.jl

Kernel Density Estimate with product approximation using multiscale Gibbs sampling
GNU Lesser General Public License v2.1
23 stars 7 forks source link

Help with the documentation #67

Open ClaudMor opened 3 years ago

ClaudMor commented 3 years ago

Hello,

I am interested in using your package, but I am not a domain expert in kde estimation or products of them. From the ReadMe it is not clear to me what methods I may call on a BallTreeDensity. For example, I noted that calling rand on a BallTreeDensity like this:

extractions = randn(1000)
p = kde!(extractions )
rand(p)

actually works.

  1. Could you add to the ReadMe a list of the methods one may call on a BallTreeDensity?
  2. More specifically, what does the resample method do?
  3. When fitting a multivariate, does the kernel assume that the different dimensions are uncorrelated? If so, is there a way to relax this assumption?
  4. Is there a way to evaluate the pdf of a BallTreeDensity at a point, even when this point is not included in the dataset from which we fit the kde? I mean something KernelDensity.jl - like ( using the p from before):
pdf(p, 0.5) # evaluate the probability density of p at 0.5 ( even though 0.5 was not included in `extractions`)

Regarding question 4. , I saw this, but I didn't really understand.

Great package!

Thanks in advance

dehann commented 3 years ago

Hi @claudio20497 ,

Thanks for posting and suggestions. I will add as soon I can, but in the mean time:

eval on the objection itself to get pdf density values

densities = X(5*rand(2,10))

see plotting at KernelDensityEstimatePlotting.jl for examples

ClaudMor commented 3 years ago

Hello @dehann ,

Thank you very much for the detailed answer.

So concerning point 3, if I understood correctly, if I sample from - say - a bivariate distribution of two correlated variables, and then call kde! on that sample, the resulting BallTreeDensity won't exhibit the correlation again, right?

EDIT: I did some experimenting:

using Distributions, KernelDensityEstimate, Plots
# generate correlated data
x = rand(Uniform(-10, 10), 1000)
y = x .^ 2
data = Array(hcat(x,y)')

# fit  a kde on them
p_corr = kde!(data )

# sample from the kde
sample_p_corr = rand(p_corr, 100)

# plot the data together with the sample
sorted_sample_p_corr = sample_p_corr[sortperm(sample_p_corr[:, 1]), :] 
sorted_data = data[sortperm(data[:,1]), :]

plot(sorted_data[:,1],sorted_data[:,2], lw = 3)
plot!(sorted_sample_p_corr[:,1], sorted_sample_p_corr[:,2] , lw = 3 )

# and note that they coincide very well

So the answer I think is: Yes, correlations are conserved.

dehann commented 3 years ago

Yes, correlations are conserved.

That's correct, the correlations are conserved. This remains true even though the individual kernel bandwidths that make up the kde use diagonal only values.

Also note KernelDesityEstimatePlotting package exists with useful function plotKDE and a variety of keyword options.

ExpandingMan commented 3 years ago

I just wanted to step in and mention that I discovered this package today, it looks quite nice, but I think lack of documentation is going to make it quite difficult for me to use. Even a link to a review of the algorithms involved would be enormously helpful, coming in cold it's very unclear what most of these methods are doing.