JeffreyRacine / R-Package-np

R package np (Nonparametric Kernel Smoothing Methods for Mixed Data Types)
https://socialsciences.mcmaster.ca/people/racinej
46 stars 18 forks source link

Individual observation kernel values #45

Closed EfthymiosCosta closed 6 months ago

EfthymiosCosta commented 6 months ago

Dear Jeffrey,

Thank you very much for this package, it is very useful! As part of a project, I would be interested in obtaining the value of the kernel functions for each individual observation rather than their average (which is the prediction). I am giving a quick example below:

`Y <- matrix(rnorm(100, 0, 1), nrow = 50, ncol = 2) # Generate some data

Y <- as.data.frame(Y)

colnames(Y) <- c('V1', 'V2')

kde <- np::npudens(~ V1 + V2, data = Y) # Compute KDE using the np package

new_point <- data.frame(V1 = 1, V2 = 0.5) # Random new observation

pred <- predict(kde, newdata = new_point)`

Now the object pred (let us denote this by $x$) includes one value which is the predicted density, given by: $$\frac{1}{nh} \sum_{i=1}^n W \left( \frac{Y_i-x}{h}\right)$$ I am interested in the individual $W_i$ values though, therefore the values of the kernel function for each of the points in the initial data frame on which the density was estimated and the point $x$. This will be a 50-dimensional vector. Is there a way to evaluate this somehow? Obviously I could create my own function for the kernel but for multiple points I fear this procedure will take a lot of time; if these values are evaluated within predict() and then averaged, is there a way to extract them?

Thank you for your time!

JeffreyRacine commented 6 months ago

Hi,

Certainly! See ?npksum, illustration using your outline is below...

  library(np)
  set.seed(123)

  n <- 50
  k <- 2

  h1 <- 0.25
  h2 <- 0.75

  Y.train <- matrix(rnorm(n*k), nrow = n, ncol = k)
  Y.eval <- matrix(c(1,.5),nrow=1,ncol=2)

  kweights <- npksum(bws=c(h1,h2),
                     txdat=Y.train,
                     exdat=Y.eval,
                     return.kernel.weights=TRUE)$kw

  ## Use raw weights to compute, e.g., the density (here kweights is 50 x 1, i.e.,
  ## number of sample observations times number of evaluation points)

  sum(kweights)/(n*h1*h2)

  ## Compute density using npudens()

  fitted(npudens(bws=c(h1,h2), tdat=Y.train, edat=Y.eval))

Hope this helps, and thanks for your interest in the package.

Jeff