deads / scipy-cluster

Automatically exported from code.google.com/p/scipy-cluster
Other
0 stars 0 forks source link

upper triangular and squareform of same distance matrix yield different linkage solutions #24

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
After producing an upper triangular distance matrix with pdist, I used
squareform to transform the matrix to square before entering using it as
input to linkage.
So for 
Y = pdist(data)
Y_sq = squareform(Y)
linkage(Y_sq) does NOT equal linkage(Y) -- 

here I expected linkage(Y_sq) == linkage(Y)

I have only read documentation indicating that Y (as upper triangle) is the
standard input to linkage, but using Y_sq yields the result I was expecting
(maybe just a fluke?).  Matlab linkage does not accept Y_sq as input.  What
goes on when I input Y_sq? why is the result different from using Y?

What version of the product are you using? On what operating system?
I am using hcluster 0.2.0 Mac 10.5

Original issue reported on code.google.com by AndyCCon...@gmail.com on 14 Jul 2009 at 9:19

GoogleCodeExporter commented 9 years ago
Hi again,
 after consulting the source code we found that when using a square matrix as input
to linkage, linkage assumes the input is data -- not a distance matrix -- and
calculates a new distance matrix by calling pdist.  Thus:
where 
Y = pdist(data) and Y_sq = squareform(Y),

linkage(Y_sq) is equivalent to linkage(pdist(Y_sq, metric='euclidean'))

cheers,
Andy

Original comment by AndyCCon...@gmail.com on 15 Jul 2009 at 6:48

GoogleCodeExporter commented 9 years ago
Hi Andy,

Thanks for your report. As you stated, the two possibilities for the first 
input to
``linkage`` is either:
   * the upper triangular of the distance matrix Y, or
   * the original matrix of observations X used to produce Y via pdist.

This is consistent with MATLAB's documentation as shown below.
"""
Z = linkage(y)
Z = linkage(y,method)
Z = linkage(X,method,metric)
Z = linkage(X,method,inputs)
"""

This is also consistent with the hcluster documentation.

"""
Z = linkage(X, method, metric='euclidean')

 Performs hierarchical clustering on the objects defined by the
 n by m observation matrix X.
"""

I hope this helps.

Thanks,

Damian

Original comment by damian.e...@gmail.com on 16 Jul 2009 at 4:04