Open chaubold opened 8 years ago
I don't see now, how can it be, that for even weird thresholds, the Transition Classifier keeps such a good result.
Thanks for generating the plots!
Well, the transition classifier learns to be pretty certain about its decisions: only very few samples have a predicted probability of around 0.5 (see below).
The distance based classification will definitely be wrong for some samples at all distances where there are true positive and negative examples.
I've put the ipython notebook with some additional graphs in your folder. Let's try this on some other (and much more complicated) datasets tomorrow.
We want to know how much better the transition classifier does with respect to the pure-distance based computation of a "transition probability".
For this, predict the probabilities (add a
predictProbability
method to the TransitionClassifier) for all validation samples using the trained random forest. Now threshold the probabilitiesp
at some thresholdt
in [0,1]: e.g. whent=0.3
, then every sample with a probabilityp>t
for being a good transition will be classified as positive, otherwise negative. (The Random Forest'spredictLabel
method does this witht=0.5
.)Do the same with transition probabilities derived from distances as follows:
Then compute precision, recall, and f-measure for each threshold and plot a graph that looks roughly like the following (curves are made up!):![img_20151210_115043](https://cloud.githubusercontent.com/assets/16854/11713680/28560066-9f35-11e5-8b19-f6e94a2c2fcd.jpg)