Closed Cota123 closed 5 months ago
Hi @Cota123 - I'm glad to hear of your interest in fastKDE for this case.
Yes, it is technically possible (an example is given below) and fastKDE yields a higher-quality estimate of the probabilities than scipy.stats.gaussian_kde
, but this isn't a use-case that fastKDE was originally designed for. As a result, fastKDE is in fact slower than the scipy code. This is because the aspect of fastKDE that makes it 'fast' is the use of a non-uniform Fourier transform for performing the KDE in spectral and the use of an FFT for transforming the KDE estimate back to data space. For the case of estimating the KDE at specific points--and not on a uniform grid--the FFT cannot be used. Rather the implementation in fastKDE uses a direct Fourier transform, which is notoriously slow. (Also, the cython code could be optimized further to improve the speed.) The correct way to approach this would be to implement a non-uniform inverse Fourier transform. While this is technically possible, it would involve quite a lot of work: more than I have available.
So it may be that fastKDE isn't appropriate for your use case. I'll note though, that I would welcome a pull-request that improves the performance of the fastkde.pdf_at_points()
function. It might be possible to optimize the cython code substantially more than I have, or it might be easy to write a short numba routine that takes advange of threads or GPUs.
But in case you do want to try, here is an example that builds on yours:
import fastkde
x = np.random.normal(20, 2, 10000)
y = np.random.normal(20, 4, 10000)
pdf = fastkde.pdf_at_points(x, y)
# scatterplots colored based on density
plt.figure(figsize=(4, 3))
plt.scatter(x, y, c=pdf, s=5, marker='o', cmap='jet', alpha=1)
# x,y labels
plt.xlabel('x')
plt.ylabel('y')
plt.show()
Thank you @taobrienlbl so much for your reply.
I got the point that what I intended to do was not what fastkde
was desgined for. Also thank you for kindly letting me know that you welcome a pull-request, although it's alright for me for now.
Hi, First of all, I want to thank you for provinding us this wonderful package. I was trying to create a scatterplot that shows the density by the means the color of plots. The following code that I have written does works but it's too slow, especially when the datasize is large. So I was wondering how to do the same thing using
fastkde
. I really appreciate if you would help me with this.