azmfaridee / mothur

This is GSoC2012 fork of 'Mothur'. We are trying to implement a number of 'Feature Selection' algorithms for microbial ecology data and incorporate them into mother's main codebase.
https://github.com/mothur/mothur
GNU General Public License v3.0
3 stars 1 forks source link

Week 14, 15 & 16: Make the Ported C++ Code and Mock Module Work Together with Each Other #25

Closed azmfaridee closed 12 years ago

azmfaridee commented 12 years ago

Weekly Update Issues: #3, #14, #15, #16, #17, #19, #20, #21 and #23

The title says it all. By this time we should have a mock class that we can call from mothur's command line. We also have a C++ implementation of the random forest algorithm. We now need to make them work together.

Other Issue Related to Mothur Integration: #4, #5, #6 and #7

azmfaridee commented 12 years ago

@kdiverson @mothur-westcott As I mentioned earlier, evaluateSample() function seems to be our bottleneck. Any way optimizing this or pruning the tree (that wold cut down the recursion depth) would speed up the program.

  int evaluateSample(vector<int> testSample) {
    TreeNode *node = rootNode;
    while (true) {
      if (node->checkIsLeaf() == true) { return node->getOutputClass(); }
      int sampleSplitFeatureValue = testSample[node->getSplitFeatureIndex()];
      if (sampleSplitFeatureValue < node->getSplitFeatureValue()) { node = node->getLeftChildNode(); }
      else { node = node->getRightChildNode(); } 
    }
  }

@kdiverson You said earlier that you have the book C4.5: Programs for Machine Learning by J. Ross Quinlan in library. Could you try to find out if there is anything on Tree Pruning in the book and let me know? I think Tree Pruning would be one of ways to optimize the implementation.

kdiverson commented 12 years ago

@darthxaher I'll try to head over to the engineering school this week and find that book, sorry I didn't get to this before.

EDIT: actually it looks like I can have chapters of the book emailed to me. Can you look at the table of contents [0] and tell me what pages (inclusive) you want?

[0] http://books.google.com/books?id=HExncpjbYroC&printsec=frontcover&source=gbs_ViewAPI#v=onepage&q&f=false

azmfaridee commented 12 years ago

EDIT: actually it looks like I can have chapters of the book emailed to me. Can you look at the table of contents and tell me what pages (inclusive) you want?

@kdiverson I would like to have a look on chapter 4 (Pruning Decision Trees). But from what I've read from chapter 4 in Google books, there is reference of Chapter 3 and 8 where the author discusses more details about pruning, Interestingly, Chapter 8 isn't there in the TOC. So, basically chapter 4 is the most important thing we need now.