chengsoonong / mclass-sky

Multiclass methods for astronomical data
BSD 3-Clause "New" or "Revised" License
9 stars 4 forks source link

KDE to determine whether a test point is near training set #143

Closed chengsoonong closed 7 years ago

chengsoonong commented 7 years ago

Use Kernel Density Estimation (KDE) on the training set to estimate the location of the data in feature space. Predict the value of the density for each example in the test set.

http://scikit-learn.org/stable/modules/density.html

chengsoonong commented 7 years ago

The longer term goal of this issue is to see whether the examples in the test set that are located in low density regions are also predicted with high uncertainty by the Gaussian process regressor.

If this is the case, then we could potentially use the KDE in conjunction with the SGDRegressor to simulate a predictor with uncertainty.

chengsoonong commented 7 years ago

See issue #77

chengsoonong commented 7 years ago

To sanity check your KDE, plot the density in terms of the RA and DEC. I.e. a 2 dimensional surface. Because you have so many points, you will have to do a 2D histogram. Compare the results to Alasdair's in (which uses a hex_map): https://github.com/chengsoonong/mclass-sky/blob/master/projects/alasdair/notebooks/02_exploratory_analysis.ipynb

nbgl commented 7 years ago

Plot each pair of bands against the predicted density. Compare against actual density.

chengsoonong commented 7 years ago

Resolved in eb2537f