amueller / introduction_to_ml_with_python

Notebooks and code for the book "Introduction to Machine Learning with Python"
7.45k stars 4.57k forks source link

Show how features derived from kmeans seperate the two half-moon #108

Open qinhanmin2014 opened 5 years ago

qinhanmin2014 commented 5 years ago

In notebook 03-unsupervised-learning

X, y = make_moons(n_samples=200, noise=0.05, random_state=0)
kmeans = KMeans(n_clusters=10, random_state=0)
kmeans.fit(X)
y_pred = kmeans.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=y_pred, s=60, cmap='Paired')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=60,
            marker='^', c=range(kmeans.n_clusters), linewidth=2, cmap='Paired')
plt.xlabel("Feature 0")
plt.ylabel("Feature 1")
print("Cluster memberships:\n{}".format(y_pred))

The book only provides the transformed feature and claims that we can now separate the two half-moon with linear models

distance_features = kmeans.transform(X)
print("Distance feature shape: {}".format(distance_features.shape))
print("Distance features:\n{}".format(distance_features))

Maybe it's better to demonstrate how features derived from kmeans separate the two half-moon, see e.g.,

from sklearn.linear_model import LogisticRegression
clf = LogisticRegression().fit(distance_features, y)
xx = np.linspace(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5, 100)
yy = np.linspace(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5, 100)
XX, YY = np.meshgrid(xx, yy)
X_grid = np.c_[XX.ravel(), YY.ravel()]
X_grid_kmeans = kmeans.transform(X_grid)
decision_values = clf.decision_function(X_grid_kmeans)

plt.figure()
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='Paired')
plt.contour(XX, YY, decision_values.reshape(XX.shape), levels=[0])
plt.show()

2018-12-17_112535

amueller commented 5 years ago

Thanks, that might indeed be a useful addition for the next print.