damellis / ESP

The Example-based Sensor Predictions (ESP) system applies machine learning to real-time sensor data.
BSD 3-Clause "New" or "Revised" License
224 stars 52 forks source link

Custom / additional scoring of training samples for ANBC and GMM. #264

Open damellis opened 8 years ago

damellis commented 8 years ago

Currently, we have a generic classifier that's based on the information gain of the sample (i.e. the negative log of the probability of classifying the sample correctly). This can, I think, be calculated for all classifier types but it's only really useful for those that have a relatively smooth / gradual probability distribution over the feature space. For example, it works well for SVM classifiers (e.g. Touché example).

This approach is not great for the Naive Bayes on a few features (e.g. accelerometer poses). That's because the probability is often either 0 or 1. Here, it might be better to score the samples based on their distance to the predicted class label, rather than the probability.

It's also not very good for DTW, because the probabilities don't tend towards 100% as additional training data is collected. Here we may need to do something very custom, like retraining the model with the new sample added, finding the new exemplar template for the sample's class, and taking the distance to the old exemplar template.

damellis commented 8 years ago

This should maybe be an additional (classifier-specific) score instead of a replacement for our current information gain (negative log of the likelihood for the assigned class). Our current approach does provide useful information w.r.t. to the separability of classes. This additional score could be useful for understanding the impact of a new sample on the distribution of a ANBC or GMM classifier.