Possible bug in plotSimilarityMatrix

Hi, I'm testing this tool and I find it very interesting; however, I'm having a little problem (I am not sure if this is a bug or if I am missing something).

I have a similarity matrix that I've calculated by applying the Jaccard similarity to my data. In R this matrix is stored in a data frame, where equal individuals have a similarity of 1, and completely distinct individuals have a similarity of 0. I am using the function plotSimilarityMatrix and It seems to be correct: imagen

Nonetheless, I tried to recreate the clustering by using hclust. This library needs a dist object, so I did 1 - my similarity matrix so that a similarity of 1 is translated into a distance of 0, and a similarity of 0 is translated into a distance of 1, and I did as.dist(myDistanceMatrix)in order to get a dist object to use with hclust. I used the default parameters for hclust (euclidean distance and complete method), however, the resulting clustering is not as nice as I got before: imagen

I do not know which cluster is the correct one, but I have checked the code of the function plotSimilarityMatrix and it is using the pheatmap library. If I am not wrong, the similarity matrix received as input by plotSimilarityMatrix is passed to pheatmat. I dived into the pheatmap function and I saw the following code used for calculating the dendrogram:

cluster_mat = function(mat, distance, method){
    if(!(method %in% c("ward.D", "ward.D2", "ward", "single", "complete", "average", "mcquitty", "median", "centroid"))){
        stop("clustering method has to one form the list: 'ward', 'ward.D', 'ward.D2', 'single', 'complete', 'average', 'mcquitty', 'median' or 'centroid'.")
    }
    if(!(distance[1] %in% c("correlation", "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski")) & class(distance) != "dist"){
        stop("distance has to be a dissimilarity structure as produced by dist or one measure  form the list: 'correlation', 'euclidean', 'maximum', 'manhattan', 'canberra', 'binary', 'minkowski'")
    }
    if(distance[1] == "correlation"){
        d = as.dist(1 - cor(t(mat)))
    }
    else{
        if(class(distance) == "dist"){
            d = distance
        }
        else{
            d = dist(mat, method = distance)
        }
    }

    return(hclust(d, method = method))
}

This code checks if the type of the input matrix is a dist object. I think, in this case this would never be a dist object because the function plotSimilarityMatrix is expecting a similarity matrix, not a dissimilarity one. Thus, the above function from pheatmat assumes that the input matrix contains data, not distances, and it calculates a distance matrix through d = dist(mat, method = distance) Then, the clustering appearing in the plot from plotSimilarityMatrix is resulting from calculating the distance among the elements from the input similarity matrix.

Am I correct? I wish I've misunderstood something because I really like the first plot provided by your library, much more than the one I obtained after by applying hclust.

Kind regards, Francisco Abad.

acabassi / klic

Possible bug in plotSimilarityMatrix #3