azmfaridee / mothur

This is GSoC2012 fork of 'Mothur'. We are trying to implement a number of 'Feature Selection' algorithms for microbial ecology data and incorporate them into mother's main codebase.
https://github.com/mothur/mothur
GNU General Public License v3.0
3 stars 1 forks source link

Week 5: Implement the Basic Building Blocks of Random Forest Algorithm #14

Closed azmfaridee closed 12 years ago

azmfaridee commented 12 years ago

Parent Issue #3

As the per the initial proposal, start coding for the feature selection algorithm of regularized random forest, particular task would be implementation of the bootstrapping part as well as the helper functions. Relevant functions could be:

createBootStrappedSample()
selectAttributesToInclude()
getCategoryProbability()
getHighestCountCategory()
calculateEntropy()
calculateTreeErrorRate()
calculateAttributeImportance()
randomlyShuffleAttribute()

End of Week Deliverable:

Code segment that can create N times bootstrapped data than what was give at the beginning as well as other mentioned helper functions

Note:

azmfaridee commented 12 years ago

@kdiverson It's taking a little bit of time to warm up my C++ skill, I'm setting up the data structures, ran into a bit of problem with the vector of some classes as they are copied on reference, not copied by value. I'll take care of that in no time. Feel free to check out the code and let me know your observations.

azmfaridee commented 12 years ago

The functionality of these functions were implemented:

createBootStrappedSample()
selectAttributesToInclude()
getCategoryProbability()
getHighestCountCategory()
calculateEntropy()

The following functionality has not been implemented, I have postponed this to a following week, so that when the Trees are created, we calculate the error rate.

calculateTreeErrorRate()
calculateAttributeImportance()
randomlyShuffleAttribute()