JeffreyRacine / R-Package-np

R package np (Nonparametric Kernel Smoothing Methods for Mixed Data Types)
https://socialsciences.mcmaster.ca/people/racinej
47 stars 18 forks source link

Epanechnikov kernel with least squares cross-validation #9

Closed saketkc closed 9 years ago

saketkc commented 9 years ago

It seems that Epanechnikov kernel with least squares cross-validation has a bug. See: http://stats.stackexchange.com/questions/176906/np-package-kernel-density-estimation-with-epanechnikov-kernel

JeffreyRacine commented 9 years ago

Hi.

Not a bug… check the bandwidth (undermoothed) and see the FAQ… thanks!

— Jeff

On Oct 25, 2015, at 1:01 AM, Saket Choudhary notifications@github.com wrote:

It seems that Epanechnikov kernel with least squares cross-validation has a bug. See: http://stats.stackexchange.com/questions/176906/np-package-kernel-density-estimation-with-epanechnikov-kernel http://stats.stackexchange.com/questions/176906/np-package-kernel-density-estimation-with-epanechnikov-kernel — Reply to this email directly or view it on GitHub https://github.com/JeffreyRacine/R-Package-np/issues/9.

saketkc commented 9 years ago

Thanks. Just for future reference this is stated in FAQ 2.31 at https://cran.r-project.org/web/packages/np/vignettes/np_faq.pdf

JeffreyRacine commented 9 years ago

Yes, but you might also mention the question as faq numbers can change.... Thanks!

Professor J. S. Racine         Phone:  (905) 525 9140 x 23825

Department of Economics      McMaster University            e-mail: racinej@mcmaster.ca 1280 Main St. W.,Hamilton,     URL: www.economics.mcmaster.ca/racine

Ontario, Canada. L8S 4M4

`The generation of random numbers is too important to be left to chance'

On Oct 25, 2015, at 08:44, Saket Choudhary notifications@github.com wrote:

Thanks. Just for future reference this is stated in FAQ 2.31 at https://cran.r-project.org/web/packages/np/vignettes/np_faq.pdf � Reply to this email directly or view it on GitHub.

saketkc commented 9 years ago

I use plot() (npplot()) to plot, say, a density and the resulting plot looks like an inverted density rather than a density

This can occur when the datadriven bandwidth is dramatically undersmoothed. Data-driven (i.e., automatic) bandwidth selection procedures are not guaranteed always to produce good results due to perhaps the presence of outliers or the rounding/discretization of continuous data, among others. By default, npplot() takes the two extremes of the data (minimum, maximum i.e., actual data points) then creates an equally spaced grid of evaluation data (i.e., not actual data points in general) and computes the density for these points. Since the bandwidth is extremely small, the density estimate at these evaluation points is correctly zero, while those for the sample realizations (in this case only two, the min and max) are non-zero, hence we get two peaks at the edges of the plot and a flat bowl equal to zero everywhere else. This can also happen when your data is heavily discretized and you treat it as continuous. In such cases, treating the data as ordered may result in more sensible estimates