chriswbartley / monoensemble

High Performance Monotone Boosting and Random Forest Classification
http://monoensemble.readthedocs.io/en/latest/index.html
Other
5 stars 1 forks source link

Error when trying to access feature_importances_ #6

Open vasselai opened 3 years ago

vasselai commented 3 years ago

So, after a few days double-testing the updated version, after fitting any model, for example the RF one from the official docs, I get the following error raised if I try to access the fitted model's feature_importances_:

Traceback (most recent call last): File "/home/vasselai/.local/lib/python3.6/site-packages/joblib/parallel.py", line 820, in dispatchone batch tasks = self._ready_batches.get(block=False) File "/usr/lib/python3.6/queue.py", line 161, in get raise Empty queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/home/vasselai/.local/lib/python3.6/site-packages/sklearn/ensemble/_forest.py", line 450, in feat ureimportances for tree in self.estimators if tree.tree.node_count > 1) File "/home/vasselai/.local/lib/python3.6/site-packages/joblib/parallel.py", line 1041, in call if self.dispatch_one_batch(iterator): File "/home/vasselai/.local/lib/python3.6/site-packages/joblib/parallel.py", line 831, in dispatchone batch islice = list(itertools.islice(iterator, big_batch_size)) File "/home/vasselai/.local/lib/python3.6/site-packages/sklearn/ensemble/forest.py", line 450, in <gen expr> for tree in self.estimators if tree.tree_.nodecount > 1) AttributeError: 'MonoGradientBoostingClassifier' object has no attribute 'tree'

At first I was under the impression that the monoensemble code was just not storing the internal fitted tree object inside an internal tree_, but that did not solve it. There's something else at play. Given the criticality of being able to explore feature importances, I thought best to bring this up officially.

vasselai commented 3 years ago

Actually, an overriding feature_importances_ function seems to be entirely lacking from the MonoRandomForestClassifier class, isn't it?

Perhaps it can be implemented analogously to the one that exists for BaseMonoGradientBoosting, like:

@property
def feature_importances_(self):
    """Return the feature importances (the higher, the more important the
       feature).

    Returns
    -------
    feature_importances_ : array, shape = [n_features]
    """
    total_sum = np.zeros((self.n_features_, ), dtype=np.float64)
    for stage in self.estimators_:
        stage_sum = sum(rule_ensemble.tree.feature_importances_
                        for rule_ensemble in stage[0]) / len(stage[0])
        total_sum += stage_sum

    importances = total_sum / len(self.estimators_)
    return importances