dclambert / pyensemble

An implementation of Caruana et al's Ensemble Selection algorithm in Python, based on scikit-learn
Other
150 stars 54 forks source link

min_samples_split == 1 raises ValueError in Decision Tree Classifier #5

Open rnmourao opened 6 years ago

rnmourao commented 6 years ago

Hi:

I tested the simplest call of ensemble_train and got a ValueError for the parameter min_samples_split:

Traceback (most recent call last): File "pyensemble/ensemble_train.py", line 202, in ens.fit(X_train, y_train) File "/home/mourao/income_prediction/pyensemble/ensemble.py", line 290, in fit self.fit_models(X, y) File "/home/mourao/income_prediction/pyensemble/ensemble.py", line 325, in fit_models model.fit(X[train_inds], y[train_inds]) File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 790, in fit X_idx_sorted=X_idx_sorted) File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 194, in fit % self.min_samples_split) ValueError: min_samples_split must be an integer greater than 1 or a float in (0.0, 1.0]; got the integer 1

I solved the problem removing 1 from the list in the file model_library.py:


def build_decisionTreeClassifiers(random_state=None):
    rs = check_random_state(random_state)

    param_grid = {
        'criterion': ['gini', 'entropy'],
        'max_features': [None, 'auto', 'sqrt', 'log2'],
        'max_depth': [None, 1, 2, 5, 10],
        'min_samples_split': [2, 5, 10],
        'random_state': [rs.random_integers(100000) for i in xrange(3)],
    }

    return build_models(DecisionTreeClassifier, param_grid)