ageron / handson-ml2

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Apache License 2.0
27.82k stars 12.74k forks source link

Gaussian Mixture models mismatch predicted classes #221

Open Rahkovsky opened 4 years ago

Rahkovsky commented 4 years ago

https://colab.research.google.com/github/ageron/handson-ml2/blob/master/09_unsupervised_learning.ipynb#scrollTo=iBdY4mAzvkHa

The code mismatches label predicted using unsupervised learning resulting in zero accuracy when I run the code in Colab.

y_pred = GaussianMixture(n_components=3, random_state=42).fit(X).predict(X) mapping = np.array([2, 0, 1]) y_pred = np.array([mapping[cluster_id] for cluster_id in y_pred])

I suggest matching clustered labels to the most common y labels prior to the calculation of accuracy:

from copy import deepcopy y_pred = GaussianMixture(n_components=3, random_state=42).fit(X).predict(X)

replace prediction with the most common value

dic_val = {} index = {} for i in range(3): newval = np.bincount(y[np.where(y_pred == i)[0]]).argmax() index[i] = deepcopy(np.where(y_pred == i)[0]) dic_val[i] = deepcopy(newval)

for i in range(3): y_pred[index[i]] = dic_val_copy[I]

My name is Ilya Rahkovsky, I am teaching a class using your textbook. Thank you for sharing your code.

benherbertson commented 4 years ago

I just encountered this issue myself while working through the chapter.

I believe mapping = np.array([2, 0, 1]) should be mapping = np.array([1, 2, 0]).