UBod / apcluster

R package implementing affinity propagation clustering along with various utilities
https://github.com/UBod/apcluster
10 stars 8 forks source link

Problem with height in aggExCluster object #7

Open ax-ekk opened 8 months ago

ax-ekk commented 8 months ago

Hi!

Thanks for this nice package. I have come across a weird behaviour in the tree from aggExCluster(). It does not happen with all datasets. I have attached one matrix for which it does happen. In this example the merge of cluster 15 with cluster 14+22 happens at a higher height than the following merge with clusters 3+8.

library(apcluster)
library(dendextend)
load("sims.Rdata") # this is my similarity matrix 

apclust_res=apcluster(s=sims,q=0)
agg_res <- aggExCluster(x=apclust_res,s=sims)  

plot(agg_res)
# the tree is messed up as one can see even better here:

as.dendrogram(as.hclust(agg_res))%>%
  set("labels_col", value = c(1:5), k=1) %>% 
  plot()

Any idea what can be causing this?

Thanks again Elin

Rplot.pdf sims.tar.gz

UBod commented 8 months ago

Hi Elin,

Thanks for bringing this issue to my attention! I ran your examples with your data, and I could reproduce the issue. I suspect that this is caused by a peculiarity of the similarity matrix that causes the data to be clustered in a way that quite similar samples arrive in different clusters, so that, when merging two clusters, the data are suddenly better explained by a common exemplar than the two clusters are explained by their respective exemplars. I agree that this is counterintuitive. However, I do not have a clean solution by now. The aggExCluster() algorithm was a sort of add-on to the 'apcluster' package, and noone has taken the effort so far to study its mathematical properties in detail. If you need a quick workaround (e.g. since you cannot publish a malformed dendrogram), I suggest to slightly adapt the parameters of the clustering (e.g. slightly increase q). Maybe that helps.

Sorry and best regards, Ulrich