azmfaridee / mothur

This is GSoC2012 fork of 'Mothur'. We are trying to implement a number of 'Feature Selection' algorithms for microbial ecology data and incorporate them into mother's main codebase.
https://github.com/mothur/mothur
GNU General Public License v3.0
3 stars 1 forks source link

Memory leak in RandomForest #39

Open mothur-westcott opened 11 years ago

mothur-westcott commented 11 years ago

This is deleting the local decisionTree. why not?

for (int i = 0; i < decisionTrees.size(); i++) { delete decisionTrees[i]; }

Also, why isn't Forest class deleting since it "owns" the decisionTrees? The Forest destructor will get called because RandomForest inherits from it.

I think you are trying to force the call of the decisionTree deconstructor by casting it here, but that's not happening.

virtual ~RandomForest() {
        for (vector<AbstractDecisionTree*>::iterator it = decisionTrees.begin(); it != decisionTrees.end(); it++) {
            // we know that this is decision tree, so we can do a dynamic_case<DecisionTree*> here
            DecisionTree* decisionTree = dynamic_cast<DecisionTree*>(*it);
            // calling the destructor by deleting
            delete decisionTree;
        }
    }
azmfaridee commented 11 years ago

This is deleting the local decisionTree. why not?

for (int i = 0; i < decisionTrees.size(); i++) { delete decisionTrees[i]; }

This will work, but only if you use virtual destructors. The decisionTrees pointers that you have is of base type AbstractDecisionTree, if you delete that without virtual destructors, only the base destructor will get called, the derived destructor will not get called. See this example.

#include <cstdlib>
#include <iostream>
#include <vector>

#define NUM_TREES 2

using namespace std;

class AbstractDecisionTree {
public:
    AbstractDecisionTree () {
        cout << "AbstractDecisionTree CTor" << endl;
    }

    ~AbstractDecisionTree () {
        cout << "AbstractDecisionTree DTor" << endl;
    }
};

class DecisionTree : public AbstractDecisionTree {
public:
    DecisionTree () {
        cout << "DecisionTree CTor" << endl; 
    }

    ~DecisionTree () {
        cout << "DecisionRree DTor" << endl;
    }
};

class Forest {
public:
    vector<AbstractDecisionTree*> absDTrees;

    Forest () {
        cout << "Forest CTor" << endl;
    }

    ~Forest () {
        cout << "Forest DTor" << endl;
    }
};

class RandomForest : public Forest {
public:
    RandomForest () {
        cout << "RandomForest CTor" << endl;
        for (int i = 0; i < NUM_TREES; i++) {
            cout << "Creating " << i << "(th) decision tree" << endl;
            DecisionTree* dtre = new DecisionTree();
            absDTrees.push_back(dtre);
        }
    }

    ~RandomForest () {
        cout << "RandomForest DTor" << endl;
        for (int i = 0; i < NUM_TREES; i++) {
            cout << "Deleting " << i << "(th) decision tree" << endl;
            delete absDTrees[i];
        }
    }
};

int main(int argc, char** argv) {
    cout << "Starting main program" << endl;
    RandomForest rf;
    cout << "End of main program" << endl;
    return 0;
}

The output is as follows (notice that the destructors of DecisionTree are not being called):

Starting main program
Forest CTor
RandomForest CTor
Creating 0(th) decision tree
AbstractDecisionTree CTor
DecisionTree CTor
Creating 1(th) decision tree
AbstractDecisionTree CTor
DecisionTree CTor
End of main program
RandomForest DTor
Deleting 0(th) decision tree
AbstractDecisionTree DTor
Deleting 1(th) decision tree
AbstractDecisionTree DTor
Forest DTor

If you add virtual destructors then the output becomes right, so in your point (considering we always use virtual destructors in our code) you are right, but my code is also right (see the last part of the reply).

#include <cstdlib>
#include <iostream>
#include <vector>

#define NUM_TREES 2

using namespace std;

class AbstractDecisionTree {
public:
    AbstractDecisionTree () {
        cout << "AbstractDecisionTree CTor" << endl;
    }

    virtual ~AbstractDecisionTree () {
        cout << "AbstractDecisionTree DTor" << endl;
    }
};

class DecisionTree : public AbstractDecisionTree {
public:
    DecisionTree () {
        cout << "DecisionTree CTor" << endl; 
    }

    virtual ~DecisionTree () {
        cout << "DecisionRree DTor" << endl;
    }
};

class Forest {
public:
    vector<AbstractDecisionTree*> absDTrees;

    Forest () {
        cout << "Forest CTor" << endl;
    }

    virtual ~Forest () {
        cout << "Forest DTor" << endl;
    }
};

class RandomForest : public Forest {
public:
    RandomForest () {
        cout << "RandomForest CTor" << endl;
        for (int i = 0; i < NUM_TREES; i++) {
            cout << "Creating " << i << "(th) decision tree" << endl;
            DecisionTree* dtre = new DecisionTree();
            absDTrees.push_back(dtre);
        }
    }

    virtual ~RandomForest () {
        cout << "RandomForest DTor" << endl;
        for (int i = 0; i < NUM_TREES; i++) {
            cout << "Deleting " << i << "(th) decision tree" << endl;
            delete absDTrees[i];
        }
    }
};

int main(int argc, char** argv) {
    cout << "Starting main program" << endl;
    RandomForest rf;
    cout << "End of main program" << endl;
    return 0;
}
Starting main program
Forest CTor
RandomForest CTor
Creating 0(th) decision tree
AbstractDecisionTree CTor
DecisionTree CTor
Creating 1(th) decision tree
AbstractDecisionTree CTor
DecisionTree CTor
End of main program
RandomForest DTor
Deleting 0(th) decision tree
DecisionRree DTor
AbstractDecisionTree DTor
Deleting 1(th) decision tree
DecisionRree DTor
AbstractDecisionTree DTor
Forest DTor

Also, why isn't Forest class deleting since it "owns" the decisionTrees? The Forest destructor will get called because RandomForest inherits from it.

Forest class owns the decisionTrees, you are right, but it's the RandomForest class that instantiates those variables by dynamic allocation, so it appeared to me that it was RandomForest's 'responsibility' to delete them. You are valid in your point that we can delete them in Forest class as well.

I think you are trying to force the call of the decisionTree deconstructor by casting it here, but that's not happening.

virtual ~RandomForest() {
        for (vector<AbstractDecisionTree*>::iterator it = decisionTrees.begin(); it != decisionTrees.end(); it++) {
            // we know that this is decision tree, so we can do a dynamic_case<DecisionTree*> here
            DecisionTree* decisionTree = dynamic_cast<DecisionTree*>(*it);
            // calling the destructor by deleting
            delete decisionTree;
        }
    }

Casting is the right way to do this if you are not using virtual destructors, but even if you are using virtual destructors it does NOT have any side effects. If I modify the example a little bit:

virtual ~RandomForest () {
        cout << "RandomForest DTor" << endl;
        for (int i = 0; i < NUM_TREES; i++) {
            cout << "Deleting " << i << "(th) decision tree" << endl;
            DecisionTree* dtre = dynamic_cast<DecisionTree*>(absDTrees[i]);
            delete dtre;
        }
    }

The output appears right:

Starting main program
Forest CTor
RandomForest CTor
Creating 0(th) decision tree
AbstractDecisionTree CTor
DecisionTree CTor
Creating 1(th) decision tree
AbstractDecisionTree CTor
DecisionTree CTor
End of main program
RandomForest DTor
Deleting 0(th) decision tree
DecisionRree DTor
AbstractDecisionTree DTor
Deleting 1(th) decision tree
DecisionRree DTor
AbstractDecisionTree DTor
Forest DTor

So what you are saying might be the reason for memory leak, is not valid. All I see is that there are multiple ways to achieve the same effect, your solution is valid, but so if mine. The memory leak that we are experiencing is not because of adding this code segment (although not having the delete command was a reason for one of the leaks, and this re-addition of code has solved that). I've explained that to @kdiverson a few times, I'd try to explain it later in another post.