flennerhag / mlens

ML-Ensemble – high performance ensemble learning
MIT License
843 stars 108 forks source link

Prediction failing with 1 row of test data #150

Open meetu30 opened 1 year ago

meetu30 commented 1 year ago

Hi, I am trying to create 100 rows of data, out of that i pass 99 in training, and only 1 in test data. But I am getting this error - ValueError: Number of splits 10 is greater than the number of samples: 1.

Below is the code snippet:

create a list of base-models

def get_models(): models = list() models.append(LinearRegression()) models.append(ElasticNet()) models.append(SVR(gamma='scale')) models.append(DecisionTreeRegressor()) models.append(KNeighborsRegressor()) models.append(AdaBoostRegressor()) models.append(BaggingRegressor(n_estimators=10)) models.append(RandomForestRegressor(n_estimators=10)) models.append(ExtraTreesRegressor(n_estimators=10)) return models

cost function for base models

def rmse(yreal, yhat): return sqrt(mean_squared_error(yreal, yhat))

create the super learner

def get_super_learner(X): ensemble = SuperLearner(scorer=rmse, folds=10, shuffle=True, sample_size=len(X), random_state=42)

add base models

models = get_models()
# add the meta model
return ensemble

from mlens.visualization import corr_X_y

create the inputs and outputs

X, y = make_regression(n_samples=100, n_features=4, noise=0.5)


X, X_val, y, y_val = train_test_split(X, y, test_size=1, random_state=42) print('Train', X.shape, y.shape, 'Test', X_val.shape, y_val.shape)

create the super learner

ensemble = get_super_learner(X)

fit the super learner

ensemble.fit(X, y)

summarize base learners


evaluate meta model

yhat = ensemble.predict(X_val) print('Super Learner: RMSE %.3f' % (rmse(y_val, yhat)))

Output is : Train (99, 4) (99,) Test (1, 4) (1,) score-m score-s ft-m ft-s pt-m pt-s layer-1 adaboostregressor 67.84 8.31 1.31 0.02 0.02 0.01 layer-1 baggingregressor 65.24 7.93 0.34 0.01 0.00 0.00 layer-1 decisiontreeregressor 80.64 16.22 0.11 0.01 0.00 0.00 layer-1 elasticnet 46.53 8.68 0.08 0.00 0.00 0.00 layer-1 extratreesregressor 56.78 10.63 0.79 0.04 0.00 0.00 layer-1 kneighborsregressor 51.99 13.06 0.00 0.00 0.00 0.00 layer-1 linearregression 0.53 0.07 0.00 0.00 0.00 0.00 layer-1 randomforestregressor 66.39 7.15 0.75 0.03 0.00 0.00 layer-1 svr 125.71 19.63 0.07 0.00 0.00 0.00 and then the value error When I do the same using manual creation of libraries, as described here - https://machinelearningmastery.com/super-learner-ensemble-in-python/ it totally works, but it DOES NOT work with Mlens.

  1. kindly help me fix this.
  2. Also, how can I use random_seed to get the same results? I am using it in train-test split, and then inside super learner, but its not working.
  3. How it picked linear regression in ensemble.add_meta(LinearRegression()) line? Kindly guide.