azmfaridee / mothur

This is GSoC2012 fork of 'Mothur'. We are trying to implement a number of 'Feature Selection' algorithms for microbial ecology data and incorporate them into mother's main codebase.
https://github.com/mothur/mothur
GNU General Public License v3.0
3 stars 1 forks source link

Implement Mean Decrease in Accuracy for Feature Ranking Like the R Version #32

Open azmfaridee opened 11 years ago

azmfaridee commented 11 years ago

This article titled Random Forest Classification and Variable Importance Based on Proteomic Profiles from Mass Spectrometry might come in handy.

Here is an excerpt from the article.

The permutation-based mean decrease in accuracy was used to measure the importance of each variable to the classification (Breiman, 2001). For each tree, the class membership of each OOB sample is predicted using the tree, and the number of correctly classified samples is counted. To measure the importance of variable xj, values of xj are permuted in the OOB sample, and the class membership of the OOB samples are predicted again from the tree. The number of correctly classified samples after permutation is subtracted from the original count of correctly classified samples and divided by the number of OOB samples for that tree, thus giving the decrease in classification accuracy as a proportion of samples. This permutation procedure is repeated for each tree in the forest, and the mean decrease in accuracy is defined as the average of these values over all trees in the forest (multiplied by 100 and presented as a mean percentage decrease in accuracy).

azmfaridee commented 11 years ago

deleted