Closed SlowMonk closed 5 years ago
see issue #1
yes, but I want to use cv_split = StratifiedShuffleSplit(n_splits=10, test_size=0.3, random_state=0) instead x_train, x_test, y_train, y_test = train_test_split(data_x,data_y,test_size=0.33, random_state=42)
y_value = array([221900, 180000, 510000, ..., 360000, 400000, 325000])
List of machine learning algorithms that will be used for predictions
estimator = [('Logistic Regression', LogisticRegression), ('Ridge Classifier', RidgeClassifier), ('SGD Classifier', SGDClassifier), ('Passive Aggressive Classifier', PassiveAggressiveClassifier), ('SVC', SVC), ('Linear SVC', LinearSVC), ('Nu SVC', NuSVC), ('K-Neighbors Classifier', KNeighborsClassifier), ('Gaussian Naive Bayes', GaussianNB), ('Multinomial Naive Bayes', MultinomialNB), ('Bernoulli Naive Bayes', BernoulliNB), ('Complement Naive Bayes', ComplementNB), ('Decision Tree Classifier', DecisionTreeClassifier), ('Random Forest Classifier', RandomForestClassifier), ('AdaBoost Classifier', AdaBoostClassifier), ('Gradient Boosting Classifier', GradientBoostingClassifier), ('Bagging Classifier', BaggingClassifier), ('Extra Trees Classifier', ExtraTreesClassifier), ('XGBoost', XGBClassifier)]
Separating independent features and dependent feature from the dataset
X_train = titanic.drop(columns='Survived')
y_train = titanic['Survived']
Creating a dataframe to compare the performance of the machine learning models
comparison_cols = ['Algorithm', 'Training Time (Avg)', 'Accuracy (Avg)', 'Accuracy (3xSTD)'] comparison_df = pd.DataFrame(columns=comparison_cols)
Generating training/validation dataset splits for cross validation
cv_split = StratifiedShuffleSplit(n_splits=10, test_size=0.3, random_state=0)
Performing cross-validation to estimate the performance of the models
for idx, est in enumerate(estimator):
comparison_df.set_index(keys='Algorithm', inplace=True) comparison_df.sort_values(by='Accuracy (Avg)', ascending=False, inplace=True)
Visualizing the performance of the models
and following error occured
ValueError Traceback (most recent call last)