KeithTheEE / scipy-cluster

Automatically exported from code.google.com/p/scipy-cluster
Other
0 stars 0 forks source link

Feature Request: Vector quantisation #12

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Not sure if this is already doable, but some docs on that usage case would be 
great. I.e like the 
scipy-cluster vq function, given a feature vector which cluster does it fall in.

Thank,
Loki

Original issue reported on code.google.com by loki.dav...@gmail.com on 21 May 2008 at 7:08

GoogleCodeExporter commented 8 years ago
Loki asked me how to do vector quantization with hierarchical clustering, and
requested a code example. Below is my reply to his request. I have changed the
category from Type-Defect to Type-Enhancement.

Damian

------------------------------------------

I don't have much time right now so this note will be quick. The
example code below generates 3 random gaussians of 100 points randomly
centered in a 10 by 10 unit region. They are clustered with centroid
linkage then the hierarchy is cut into flat clusters with fcluster.
The members of each cluster are then used to compute centroids.

If you are trying to do vector quantization, you may find k-means
easier to work with. If you don't know the number of codes in the code
book (clusters) a priori, you might try QT clustering or Mean Shifting
with Kernel Density Estimation. k-means is in scipy and QT
clustering+mean shifting will be integrated into scipy-cluster this
summer.

You might try using kd-trees to get better performance out of
membership lookups. The ANN scikit has a good implementation.

I hope this helps.

Cheers,

Damian

import numpy as np
import matplotlib.pylab as mpl
import hcluster

nc = 3
ppc = 100

X = np.random.randn(nc*ppc,2) * 0.5

for i in xrange(0, nc):
   shift = np.random.rand(2) * 10.0
   print shift
   X[i*ppc:(i*ppc+ppc), :] += shift

# plot the gaussians
mpl.plot(X[:,0], X[:, 1], 'bo')
mpl.show()

# perform centroid linkage
Z = hcluster.linkage(X, 'centroid')

# flatten the hierarchy into flat clusters.
labels = hcluster.fcluster(Z, nc, 'maxclust') - 1

# print the labels returned.
print labels

centroids = np.zeros((nc, 2))

# for each cluster, compute its centroid based on the labels vector.
for i in xrange(0, nc):
   centroids[i, :] = X[labels == i].mean(axis=0)

# plot the gaussians
mpl.plot(X[:,0], X[:, 1], 'bo')
mpl.plot(centroids[:, 0], centroids[:, 1], 'ro')
mpl.show()

Original comment by damian.e...@gmail.com on 23 May 2008 at 1:11