Closed RohanSahana closed 3 years ago
@RohanSahana
X_train = X
Y_train = Y
These lines defeat the purpose of the train_test_split()
function. You can also see that the score (accuracy) attained for your function becomes 100%, which is unrealistic in a well-tested machine learning model. By including these lines your training data regains the training examples that you set aside for testing. So when you fit your model, your model also trains on these examples, which should not have remained in X_train
and Y_train
. When you test your model on the X_test
and Y_test
, you are only making predictions examples you have already trained on, causing your model to score 100%.
You also seem to have an unrelated file in your svm
folder.
@RohanSahana
X_train = X Y_train = Y
These lines defeat the purpose of the
train_test_split()
function. You can also see that the score (accuracy) attained for your function becomes 100%, which is unrealistic in a well-tested machine learning model. By including these lines your training data regains the training examples that you set aside for testing. So when you fit your model, your model also trains on these examples, which should not have remained inX_train
andY_train
. When you test your model on theX_test
andY_test
, you are only making predictions examples you have already trained on, causing your model to score 100%. You also seem to have an unrelated file in yoursvm
folder.
Training must be 100% and the values we will predict in future will be unknown and therefore give the best results on testing. train_test_split() is just for our satisfaction to know the performance of our model. But in real-life problem, 100% training is best just like teaching a student 100% and taking the test. The thing which matters is the score of the test dataset. And there is no unrelated file in svm folder, there are 2 files in it. One is the model using linear svm classifier and other uses rbf svm classifier.
@RohanSahana
Therefore running the prediction on your test examples as you have done would be redundant in this case. Perhaps the problem statement was a little vague. You can then remove the two lines I specified in my previous comment, train your models on the training you made after splitting, run predictions on the test examples you made after splitting, print the score, and then repost your files.
You can ignore my comment on your svm
folder. Your files are alright.
Resolves Issue <#46 >
Description
Added breast cancer prediction with Logistic Regression, Random Forest and SVM (Linear and RBF)
Technical Specifications
Scores for each of the classifier are impressive, i.e. over 95. It proves that it is an excellent model.
How to run
Just open the files by using notebook software like - Jupyter notebook or you can use Google Colab.
Checklist