LBL-EESA / fastkde

Other
52 stars 11 forks source link

How to predict density at each given data point? #34

Closed leanhdung1994 closed 6 months ago

leanhdung1994 commented 7 months ago

In my case, I have a sample z containing 100,000 observations. I want to predict the density at each data point in w:

import numpy as np
import fastkde

 N = int(1e5)
 z = 50*np.random.normal(size=N) + 0.1
 w = list(range(10, 0, -2))

Could you elaborate on how to do so? Thank you so much for your elaboration!

taobrienlbl commented 6 months ago

Hi @leanhdung1994 - there are three ways to do this. The most straightforward might be the pdf_at_points approach.

""" Method 1 (slow): pdf_at_points. """

# use a DFT to calculate the PDF at the points in w (works even if w are unevenly spaced, but is slow b/c a FFT isn't used)
pdf_at_w = fastkde.pdf_at_points(z, list_of_points = w)

pdf_at_w

""" Method 2 (fast, but some error): use xarray and nearest neighbor interpolation. """

# calculate the PDF with a large number of points for more accurate nearest neighbor interpolation
pdf_z = fastkde.pdf(z, num_points = 1025, var_names = ['z'])

pdf_at_w = pdf_z.sel(z = w, method = 'nearest').values
pdf_at_w

""" Method 3 (fast, but tricky): force the PDF to be calculated at the points in w. """

# set the values at which the PDF will be calculated
# * values must be evenly spaced
# * the number of values must be a power of two plus 1 (e.g., 65)
# * the range of values should be substantially larger than the range of z (which is ~ +/- 200 here)
axes = [np.arange(-525, 526, 1)[:1025]]

# calculate the PDF on the grid defined by `axes`
pdf_z = fastkde.pdf(z, axes = axes, var_names = ['z'])

pdf_at_w = pdf_z.sel(z = w).values
pdf_at_w