Correction of a bug in RandomForestClassifier, the previous versions clf = clf.fit(X, Y) used the whole data to train. It explains the perfect accuracy.
The data are already cleaned, it means every feature is important. But setting max_features=1 in BaggingClassifier and max_features=None in RandomForestClassifier, it improves the performances.
To improve the KNN speed, I put algorithm='brute'. It means we compute distance with the whole training set without heuristics. With our small data size, it's not a big deal.
Now the accuracy are (for 100 runs):
100%|███████████████████████████████████████| 100/100 [00:01<00:00, 59.32it/s]
Algorithm: KNN
Min : 0.45535714285714285
Max : 0.6521739130434783
Mean : 0.5311316330992192
Median : 0.5278888888888889
Stdev : 0.03616387925953278
Variance: 0.0013078261630980652
100%|███████████████████████████████████████| 100/100 [00:02<00:00, 37.69it/s]
Algorithm: Bagging+knn
Min : 0.4
Max : 0.6090909090909091
Mean : 0.4978573258701024
Median : 0.49789915966386555
Stdev : 0.04510562271428201
Variance: 0.002034517200443153
100%|███████████████████████████████████████| 100/100 [00:01<00:00, 95.90it/s]
Algorithm: Decision Tree
Min : 0.37719298245614036
Max : 0.6
Mean : 0.49276376502243846
Median : 0.49545416976609635
Stdev : 0.04116087974073015
Variance: 0.0016942180210308497
100%|███████████████████████████████████████| 100/100 [00:03<00:00, 28.41it/s]
Algorithm: Random Forest
Min : 0.44642857142857145
Max : 0.6434782608695652
Mean : 0.5521866955639984
Median : 0.555050505050505
Stdev : 0.036821076976389644
Variance: 0.0013557917097012117
Multiple things:
clf = clf.fit(X, Y)
used the whole data to train. It explains the perfect accuracy.setting max_features=1
in BaggingClassifier andmax_features=None
in RandomForestClassifier, it improves the performances.algorithm='brute'
. It means we compute distance with the whole training set without heuristics. With our small data size, it's not a big deal.Now the accuracy are (for 100 runs):