Closed GoogleCodeExporter closed 9 years ago
Hi again,
after consulting the source code we found that when using a square matrix as input
to linkage, linkage assumes the input is data -- not a distance matrix -- and
calculates a new distance matrix by calling pdist. Thus:
where
Y = pdist(data) and Y_sq = squareform(Y),
linkage(Y_sq) is equivalent to linkage(pdist(Y_sq, metric='euclidean'))
cheers,
Andy
Original comment by AndyCCon...@gmail.com
on 15 Jul 2009 at 6:48
Hi Andy,
Thanks for your report. As you stated, the two possibilities for the first
input to
``linkage`` is either:
* the upper triangular of the distance matrix Y, or
* the original matrix of observations X used to produce Y via pdist.
This is consistent with MATLAB's documentation as shown below.
"""
Z = linkage(y)
Z = linkage(y,method)
Z = linkage(X,method,metric)
Z = linkage(X,method,inputs)
"""
This is also consistent with the hcluster documentation.
"""
Z = linkage(X, method, metric='euclidean')
Performs hierarchical clustering on the objects defined by the
n by m observation matrix X.
"""
I hope this helps.
Thanks,
Damian
Original comment by damian.e...@gmail.com
on 16 Jul 2009 at 4:04
Original issue reported on code.google.com by
AndyCCon...@gmail.com
on 14 Jul 2009 at 9:19