JeffreyRacine / R-Package-np

R package np (Nonparametric Kernel Smoothing Methods for Mixed Data Types)
https://socialsciences.mcmaster.ca/people/racinej
46 stars 18 forks source link

Issue with npksum for 1 continuous/categorical variable #46

Closed EfthymiosCosta closed 5 months ago

EfthymiosCosta commented 5 months ago

Dear Jeffrey,

While working on a project, I used the npksum function and realised there was something odd. Take X to be a dataset with 1 continuous and 2 categorical variables:

X <- data.frame('V1' = rnorm(100, 0, 1), 'V2' = sample(c(0, 1), size = 100, replace = TRUE), 'V3' = sample(c(0, 1, 2), size = 100, replace = TRUE)) X[, 2] <- as.factor(X[, 2]) X[, 3] <- as.factor(X[, 3])

I then use npksum because I want to extract the kernel weights:

bws <- c(1, 0.5, 0.5) npksum_res <- np::npksum(bws=bws, txdat=X, exdat=X, return.kernel.weights=TRUE)

I get the following error:

Error in npksum.default(txdat = txdat, bws = tbw, exdat = exdat, return.kernel.weights = return.kernel.weights) : supplied bandwidths do not match 'txdat' in type

I am not sure why this error is being raised; for one categorical and multiple continuous variables this is not working either. However, if I give it only continuous or only categorical variables or a combination of both types (with at least 2 variables of each type) it works perfectly. Why is this error message being raised? Do you have suggestions on what can be done to troubleshoot this?

Thanks!

JeffreyRacine commented 5 months ago

Hi,

Apologies, unable to replicate using current version... that being said, since exdat and txdat are identical the call to exdat is doing nothing (I left it in below and used your exact code. from above)...

sessionInfo() R version 4.3.2 (2023-10-31) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

time zone: America/Toronto tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] MASS_7.3-60.0.1 cubature_2.1.0 compiler_4.3.2 Matrix_1.6-5 quantreg_5.97
[6] tools_4.3.2 SparseM_1.81 survival_3.5-8 MatrixModels_0.5-3 Rcpp_1.0.12
[11] splines_4.3.2 np_0.60-17 grid_4.3.2 boot_1.3-30 lattice_0.22-5
[16] quadprog_1.5-8

packageVersion("np") [1] ‘0.60.17’

X <- data.frame('V1' = rnorm(100, 0, 1), 'V2' = sample(c(0, 1), size = 100, replace = TRUE), 'V3' = sample(c(0, 1, 2), size = 100, replace = TRUE))

X[, 2] <- as.factor(X[, 2])

X[, 3] <- as.factor(X[, 3])

bws <- c(1, 0.5, 0.5)

npksum_res <- np::npksum(bws=bws, txdat=X, exdat=X, return.kernel.weights=TRUE)

names(npksum_res) [1] "bw" "data.names" "nobs" "ndim" "nord" "nuno" "ncon" "pscaling"
[9] "ptype" "pckertype" "pukertype" "pokertype" "eval" "ksum" "kw" "p.ksum"
[17] "ntrain" "trainiseval"