Closed vincentmaarek closed 11 months ago
Hi @vincentmaarek - glad to hear that you're considering whether fastKDE might be useful for your work! I can confirm the slow timing of the example you've given.
Unfortunately this is a known problem, as this test-predict mode of operation is different from the way fastKDE was originally designed to be used. fastKDE was originally designed to provide estimates of PDFs on a regular grid (not at randomly spaced points), which allows use of the fast Fourier transform. Calculating the PDF at random points in fastKDE requires a direct Fourier transform, which has much worse computational performance: things get slow quickly as points are added. To roughly illustrate this, the direct Fourier transform (the bulk of the work in the call to __transformphiSC_points__()) should, in the case you've shown, loops over 1e5 2 1e5 = 2e10 points (see https://github.com/LBL-EESA/fastkde/blob/030bf72b82aa69b8fb27dfca2ad0d74494085c6a/fastkde/nufft.pyx#L602). The operation runs on complex numbers (so there are two floating calculations required per operation), and I count about 10 complex operations within the loop: so that makes for a total of about 4e11 floating point operations. Assuming a computer that can do 1e9 floating point operations per second (a typical GHz processor), that comes out to about 400 seconds: similar to what you found.
There's a fix to either avoid the DFT or to parallelize, but either would take a fair amount of work: more than I'm likely to be able to do in the near term. See https://github.com/LBL-EESA/fastkde/issues/8#issuecomment-1119108542 for more information.
Regarding your question about bandwidth:
Moreover, are you planning to include the bandwidth as an input ? The short answer is no, because fastKDE works fundamentally differently from other KDE routines. The bandwidth (and in fact the shape of the kernel) is determined uniquely and optimally from the input data.
Hi Travis,
Thanks a lot for your answer.
I've found a solution regarding the prediction time. I don't know exactly what method is used in the R ks
package for the KDE estimation part, but the output is a grid, so similar to what is produced by kernel_par, axes = fastKDE.pdf(x, y)
. By digging into the R source code for the function predict.kde
, it seems that the method uses the grid and weights the 4 nearest points of the grid around each new data point to produce the prediction. I've replicated this function in Python (only the 2D version, as this is the case I need as for now, but it can be easily generalized to other dimensions). I'm happy to share it with you, should this solution could be useful to other users, or should you be interested to use it into your package.
def predict_kde(new_data, kernel_par, axes):
n = new_data.shape[0]
M1 = axes[0].shape[0]
M2 = axes[1].shape[0]
a1 = axes[0][0]
a2 = axes[1][0]
b1 = axes[0][-1]
b2 = axes[1][-1]
ixmin1 = 0
ixmax1 = M1 - 2
ixmin2 = 0
ixmax2 = M2 - 2
xdelta1 = (b1 - a1) / (M1 - 1)
xdelta2 = (b2 - a2) / (M2 - 1)
x1 = new_data[:,0]
x2 = new_data[:,1]
est = np.zeros(n)
for i in range(n):
xpos1 = (x1[i] - a1) / xdelta1
xpos2 = (x2[i] - a2) / xdelta2
ix1 = np.floor(xpos1).astype("int")
ix2 = np.floor(xpos2).astype("int")
fx1 = xpos1 - ix1
fx2 = xpos2 - ix2
if((ixmin1 <= ix1) & (ix1 < ixmax1) & (ixmin2 <= ix2) & (ix2 < ixmax2)):
est[i] = kernel_par[ix1, ix2]*(1-fx1)*(1-fx2) + kernel_par[ix1 + 1, ix2]*fx1*(1-fx2) + kernel_par[(ix1), ix2+1]*(1-fx1)*fx2 + kernel_par[ix1 + 1, ix2+1]*fx1*fx2
elif((ix1 == ixmax1) & (ixmin2 <= ix2) & (ix2 < ixmax2)) :
est[i] = kernel_par[ix1, ix2]*(1-fx1)*(1-fx2) + kernel_par[ix1, ix2+1]*(1-fx1)*fx2
elif ((ixmin1 <= ix1) & (ix1 < ixmax1) & (ix2 == ixmax2)) :
est[i] = kernel_par[ix1, ix2]*(1-fx1)*(1-fx2) + kernel_par[ix1 + 1, ix2]*fx1*(1-fx2)
elif ((ix1 == ixmax1) & (ix2 == ixmax2)):
est[i] = kernel_par[ix1, ix2]*(1-fx1)*(1-fx2)
return(est)
This enables to get prediction time close to the ks
package, of course mitigated by the error in the prediction, especially for out-of-the box new data points.
Regarding the bandwidth, thank you for your comment. I indeed noticed that bandwidth and shape of the kernel are optimally determined. To be precise, the reason why I need those two inputs is to be able to simulate from the estimated KDE. One of the algorithms I had in mind is :
So to rephrase the question :
fastkde
?Thanks again for your help.
Thanks for sharing this code @vincentmaarek. Currently there is no capability for sampling from a fastKDE PDF, but I'd welcome a pull request if you ever get a chance.
And yes, it is possible to access the kernel directly (fastKDE.kSC
if fastKDE.fastKDE()
is called with the do_save_transformed_kernel
option), though some work would be needed to translate that to an estimate of the bandwidth.
Since this doesn't seem straightforward at the moment, I'm going to close this as 'wont-fix'.
Cheers
Hi,
I am currently trying to translate some R code into Python. As KDE performs very well in terms of execution time in R using the
ks
package,fastkde
seems to be the answer in Python.While the pdf part is extremely fast, I face some difficulties retrieving R performances on the prediction part on a new data set. Here's a reproducible example :
data <- mvrnorm(1e5, mu = c(0, 0), Sigma = matrix(c(10, 3, 3, 2), nrow = 2)) kn <- kde(data, H = diag(2))
new_data <- mvrnorm(1e5, mu = c(1, 5), Sigma = matrix(c(20, 6, 6, 4), nrow = 2))
a <- Sys.time() kn_pred <- predict(kn, x = new_data) b <- Sys.time()
b-a #Time difference of 0.005482912 secs
import numpy as np from fastkde import fastKDE import time
N = 100000 var1 = 50np.random.normal(size=N) + 0.1 var2 = 0.01np.random.normal(size=N) - 300
var3 = 100np.random.normal(size=N) + 0.1 var4 = 0.01np.random.normal(size=N) - 150
train = np.vstack((var1, var2)) test = np.vstack((var3, var4))
myPDF = fastKDE.fastKDE(train) myPDF.applyBernacchiaFilter()
a = time.time() myPDF.__transformphiSC_points__(test) b = time.time()
print(b-a) #Time difference of 425.6120500564575