Prediction execution time

Hi,

I am currently trying to translate some R code into Python. As KDE performs very well in terms of execution time in R using the ks package, fastkde seems to be the answer in Python.

While the pdf part is extremely fast, I face some difficulties retrieving R performances on the prediction part on a new data set. Here's a reproducible example :

R example
```
library(ks)
library(MASS)
```

data <- mvrnorm(1e5, mu = c(0, 0), Sigma = matrix(c(10, 3, 3, 2), nrow = 2)) kn <- kde(data, H = diag(2))

new_data <- mvrnorm(1e5, mu = c(1, 5), Sigma = matrix(c(20, 6, 6, 4), nrow = 2))

a <- Sys.time() kn_pred <- predict(kn, x = new_data) b <- Sys.time()

b-a #Time difference of 0.005482912 secs


- Python example

import numpy as np from fastkde import fastKDE import time

N = 100000 var1 = 50np.random.normal(size=N) + 0.1 var2 = 0.01np.random.normal(size=N) - 300

var3 = 100np.random.normal(size=N) + 0.1 var4 = 0.01np.random.normal(size=N) - 150

train = np.vstack((var1, var2)) test = np.vstack((var3, var4))

myPDF = fastKDE.fastKDE(train) myPDF.applyBernacchiaFilter()

a = time.time() myPDF.__transformphiSC_points__(test) b = time.time()

print(b-a) #Time difference of 425.6120500564575



So, is there a way to achieve the same performances for the prediction part as R ?
Moreover, are you planning to include the bandwidth as an input ?

Thanks in advance.

Hi @vincentmaarek - glad to hear that you're considering whether fastKDE might be useful for your work! I can confirm the slow timing of the example you've given.

Unfortunately this is a known problem, as this test-predict mode of operation is different from the way fastKDE was originally designed to be used. fastKDE was originally designed to provide estimates of PDFs on a regular grid (not at randomly spaced points), which allows use of the fast Fourier transform. Calculating the PDF at random points in fastKDE requires a direct Fourier transform, which has much worse computational performance: things get slow quickly as points are added. To roughly illustrate this, the direct Fourier transform (the bulk of the work in the call to __transformphiSC_points__()) should, in the case you've shown, loops over 1e5 2 1e5 = 2e10 points (see https://github.com/LBL-EESA/fastkde/blob/030bf72b82aa69b8fb27dfca2ad0d74494085c6a/fastkde/nufft.pyx#L602). The operation runs on complex numbers (so there are two floating calculations required per operation), and I count about 10 complex operations within the loop: so that makes for a total of about 4e11 floating point operations. Assuming a computer that can do 1e9 floating point operations per second (a typical GHz processor), that comes out to about 400 seconds: similar to what you found.

There's a fix to either avoid the DFT or to parallelize, but either would take a fair amount of work: more than I'm likely to be able to do in the near term. See https://github.com/LBL-EESA/fastkde/issues/8#issuecomment-1119108542 for more information.

Regarding your question about bandwidth:

Moreover, are you planning to include the bandwidth as an input ? The short answer is no, because fastKDE works fundamentally differently from other KDE routines. The bandwidth (and in fact the shape of the kernel) is determined uniquely and optimally from the input data.

Hi Travis,

Thanks a lot for your answer.

I've found a solution regarding the prediction time. I don't know exactly what method is used in the R ks package for the KDE estimation part, but the output is a grid, so similar to what is produced by kernel_par, axes = fastKDE.pdf(x, y). By digging into the R source code for the function predict.kde, it seems that the method uses the grid and weights the 4 nearest points of the grid around each new data point to produce the prediction. I've replicated this function in Python (only the 2D version, as this is the case I need as for now, but it can be easily generalized to other dimensions). I'm happy to share it with you, should this solution could be useful to other users, or should you be interested to use it into your package.

def predict_kde(new_data, kernel_par, axes):

    n = new_data.shape[0]

    M1 = axes[0].shape[0]
    M2 = axes[1].shape[0]

    a1 = axes[0][0]
    a2 = axes[1][0]
    b1 = axes[0][-1]
    b2 = axes[1][-1]

    ixmin1 = 0
    ixmax1 = M1 - 2
    ixmin2 = 0
    ixmax2 = M2 - 2
    xdelta1 = (b1 - a1) / (M1 - 1)
    xdelta2 = (b2 - a2) / (M2 - 1)

    x1 = new_data[:,0]
    x2 = new_data[:,1]
    est = np.zeros(n)

    for i in range(n):
        xpos1 = (x1[i] - a1) / xdelta1
        xpos2 = (x2[i] - a2) / xdelta2
        ix1 = np.floor(xpos1).astype("int")
        ix2 = np.floor(xpos2).astype("int")
        fx1 = xpos1 - ix1
        fx2 = xpos2 - ix2

        if((ixmin1 <= ix1) & (ix1 < ixmax1) & (ixmin2 <= ix2) & (ix2 < ixmax2)):
            est[i] = kernel_par[ix1, ix2]*(1-fx1)*(1-fx2) + kernel_par[ix1 + 1, ix2]*fx1*(1-fx2) + kernel_par[(ix1), ix2+1]*(1-fx1)*fx2 + kernel_par[ix1 + 1, ix2+1]*fx1*fx2

        elif((ix1 == ixmax1) & (ixmin2 <= ix2) & (ix2 < ixmax2)) :
            est[i] = kernel_par[ix1, ix2]*(1-fx1)*(1-fx2) + kernel_par[ix1, ix2+1]*(1-fx1)*fx2

        elif ((ixmin1 <= ix1) & (ix1 < ixmax1) & (ix2 == ixmax2)) :
            est[i] = kernel_par[ix1, ix2]*(1-fx1)*(1-fx2) + kernel_par[ix1 + 1, ix2]*fx1*(1-fx2)

        elif ((ix1 == ixmax1) & (ix2 == ixmax2)):
            est[i] = kernel_par[ix1, ix2]*(1-fx1)*(1-fx2)

    return(est)

This enables to get prediction time close to the ks package, of course mitigated by the error in the prediction, especially for out-of-the box new data points.

Regarding the bandwidth, thank you for your comment. I indeed noticed that bandwidth and shape of the kernel are optimally determined. To be precise, the reason why I need those two inputs is to be able to simulate from the estimated KDE. One of the algorithms I had in mind is :

to pick one of the existing data points
to simulate a random normal sample (if gaussian kernel) with correlation matrix being the bandwidth
shift the initial point with the above sample, that leads to the new data point.

So to rephrase the question :

Is there ideally a way to sample from the estimated KDE directly using fastkde ?
If not, is there any way to access the optimal bandwidth and kernel shape so that I can apply the above algorithm ?

Thanks again for your help.

Thanks for sharing this code @vincentmaarek. Currently there is no capability for sampling from a fastKDE PDF, but I'd welcome a pull request if you ever get a chance.

And yes, it is possible to access the kernel directly (fastKDE.kSC if fastKDE.fastKDE() is called with the do_save_transformed_kernel option), though some work would be needed to translate that to an estimate of the bandwidth.

Since this doesn't seem straightforward at the moment, I'm going to close this as 'wont-fix'.

Cheers

LBL-EESA / fastkde

Prediction execution time #11