Open meetu30 opened 1 year ago
Hi, I am trying to create 100 rows of data, out of that i pass 99 in training, and only 1 in test data. But I am getting this error - ValueError: Number of splits 10 is greater than the number of samples: 1.
Below is the code snippet:
def get_models(): models = list() models.append(LinearRegression()) models.append(ElasticNet()) models.append(SVR(gamma='scale')) models.append(DecisionTreeRegressor()) models.append(KNeighborsRegressor()) models.append(AdaBoostRegressor()) models.append(BaggingRegressor(n_estimators=10)) models.append(RandomForestRegressor(n_estimators=10)) models.append(ExtraTreesRegressor(n_estimators=10)) return models
def rmse(yreal, yhat): return sqrt(mean_squared_error(yreal, yhat))
def get_super_learner(X): ensemble = SuperLearner(scorer=rmse, folds=10, shuffle=True, sample_size=len(X), random_state=42)
models = get_models() ensemble.add(models) # add the meta model ensemble.add_meta(LinearRegression()) return ensemble
from mlens.visualization import corr_X_y
X, y = make_regression(n_samples=100, n_features=4, noise=0.5)
X, X_val, y, y_val = train_test_split(X, y, test_size=1, random_state=42) print('Train', X.shape, y.shape, 'Test', X_val.shape, y_val.shape)
ensemble = get_super_learner(X)
ensemble.fit(X, y)
print(ensemble.data)
yhat = ensemble.predict(X_val) print('Super Learner: RMSE %.3f' % (rmse(y_val, yhat)))
Output is : Train (99, 4) (99,) Test (1, 4) (1,) score-m score-s ft-m ft-s pt-m pt-s layer-1 adaboostregressor 67.84 8.31 1.31 0.02 0.02 0.01 layer-1 baggingregressor 65.24 7.93 0.34 0.01 0.00 0.00 layer-1 decisiontreeregressor 80.64 16.22 0.11 0.01 0.00 0.00 layer-1 elasticnet 46.53 8.68 0.08 0.00 0.00 0.00 layer-1 extratreesregressor 56.78 10.63 0.79 0.04 0.00 0.00 layer-1 kneighborsregressor 51.99 13.06 0.00 0.00 0.00 0.00 layer-1 linearregression 0.53 0.07 0.00 0.00 0.00 0.00 layer-1 randomforestregressor 66.39 7.15 0.75 0.03 0.00 0.00 layer-1 svr 125.71 19.63 0.07 0.00 0.00 0.00 and then the value error When I do the same using manual creation of libraries, as described here - https://machinelearningmastery.com/super-learner-ensemble-in-python/ it totally works, but it DOES NOT work with Mlens.
Hi, I am trying to create 100 rows of data, out of that i pass 99 in training, and only 1 in test data. But I am getting this error - ValueError: Number of splits 10 is greater than the number of samples: 1.
Below is the code snippet:
create a list of base-models
def get_models(): models = list() models.append(LinearRegression()) models.append(ElasticNet()) models.append(SVR(gamma='scale')) models.append(DecisionTreeRegressor()) models.append(KNeighborsRegressor()) models.append(AdaBoostRegressor()) models.append(BaggingRegressor(n_estimators=10)) models.append(RandomForestRegressor(n_estimators=10)) models.append(ExtraTreesRegressor(n_estimators=10)) return models
cost function for base models
def rmse(yreal, yhat): return sqrt(mean_squared_error(yreal, yhat))
create the super learner
def get_super_learner(X): ensemble = SuperLearner(scorer=rmse, folds=10, shuffle=True, sample_size=len(X), random_state=42)
add base models
from mlens.visualization import corr_X_y
create the inputs and outputs
X, y = make_regression(n_samples=100, n_features=4, noise=0.5)
split
X, X_val, y, y_val = train_test_split(X, y, test_size=1, random_state=42) print('Train', X.shape, y.shape, 'Test', X_val.shape, y_val.shape)
create the super learner
ensemble = get_super_learner(X)
fit the super learner
ensemble.fit(X, y)
summarize base learners
print(ensemble.data)
evaluate meta model
yhat = ensemble.predict(X_val) print('Super Learner: RMSE %.3f' % (rmse(y_val, yhat)))
Output is : Train (99, 4) (99,) Test (1, 4) (1,) score-m score-s ft-m ft-s pt-m pt-s layer-1 adaboostregressor 67.84 8.31 1.31 0.02 0.02 0.01 layer-1 baggingregressor 65.24 7.93 0.34 0.01 0.00 0.00 layer-1 decisiontreeregressor 80.64 16.22 0.11 0.01 0.00 0.00 layer-1 elasticnet 46.53 8.68 0.08 0.00 0.00 0.00 layer-1 extratreesregressor 56.78 10.63 0.79 0.04 0.00 0.00 layer-1 kneighborsregressor 51.99 13.06 0.00 0.00 0.00 0.00 layer-1 linearregression 0.53 0.07 0.00 0.00 0.00 0.00 layer-1 randomforestregressor 66.39 7.15 0.75 0.03 0.00 0.00 layer-1 svr 125.71 19.63 0.07 0.00 0.00 0.00 and then the value error When I do the same using manual creation of libraries, as described here - https://machinelearningmastery.com/super-learner-ensemble-in-python/ it totally works, but it DOES NOT work with Mlens.