Reuse Bandwidth or Density

wadehenning commented 4 years ago

Once I estimate a density for a set of training points I would like to be able to store the density and have it available for classification of new test points. My current approach for estimating likelihoods looks something like:

pNormal = kde!(randn(2,10000))
sampleNormal = resample(pNormal, 1000)
likelihoods = evaluateDualTree(pNormal,sampleNormal)

But estimating the density every time I need it is too slow for multiple large data sets in real time. In looking at the BallTreeDensity, it seems that there is no way for me to store the whole density estimate (in a database) for later use. However, I suspect most of the kde! calculation time is spend on the LOOCV bandwidth estimation. Is there a way for me to store the bandwidth and reuse it on subsequent calls to kde!() based on the same training data?

Thanks for any thoughts.

dehann commented 3 years ago

HI, sorry I did not see this before. Yes, you can just pass in the bandwidth:

pts = randn(2,100)
p = kde!(pts)
bw = getBW(p)[:,1]

p_ = kde!(pts, bw)

dehann commented 3 years ago

Regarding storing in a database, we have some capacity to serialize. Currently you can dump the whole KDE object as a string with strp = string(p) and then extract with convert(SamplableBelief, strp). This is likely to be updated to a more formal serialized object either JSON, BSON or something like that somewhere in 2021.

wadehenning commented 3 years ago

I just wanted to comment that I have been using the ability to obtain and modify the bandwidth; it is a great feature that I appreciate (today am using it in a smoothing context).

JuliaRobotics / KernelDensityEstimate.jl

Reuse Bandwidth or Density #55