asardaes / dtwclust

R Package for Time Series Clustering Along with Optimizations for DTW
https://cran.r-project.org/package=dtwclust
GNU General Public License v3.0
254 stars 29 forks source link

compare_clusterings() pick results #57

Closed vidarsumo closed 2 years ago

vidarsumo commented 2 years ago

These are the results I get from compare_clustering():

> comparison_long_h$pick
$object
hierarchical clustering with 2 clusters
Using sbd distance
Using PAM (Hierarchical) centroids
Using method average 
Using zscore preprocessing

Time required for analysis:
   user  system elapsed 
   3.01    0.06    0.92 

Cluster sizes with average intra-cluster distance:

  size       av_dist
1  136  3.025812e-01
2    1 -6.661338e-16

$config
   config_id k method symmetric preproc center_preproc distance centroid       Sil
77 config2_5 6 ward.D     FALSE  zscore           TRUE      sbd  default 0.3706267

It seem that the appropriate size is k = 2 since it's showing me two clusers and the size of each. But then it's also showing me config2_5 which has k = 6. What is config2_5 telling me here?

asardaes commented 2 years ago

Each configuration gets a config_id assigned by the code, it's just to have a way to uniquely identify one combination of parameters. See for instance the example at the end of page 19 in the R Journal paper.

The underscore in 2_5 is just an indication that a single call of one of your configurations returned multiple results, probably due to method = "all" in hierarchical clustering; you likely have rows in the result's data frame where config_id starts with config2_ and every parameter except method has the same value.

vidarsumo commented 2 years ago

I think I might have been a bit unclear. The $config element in the list is one line. This one line contains informations about k, method, symmetric etc. In this element, k = 6 but in the $object element k = 2.

I'm trying to find out the optimal number of clusters so I'm a bit puzzled that there is a mismatch between the information. If k = 2 is the optimal number of clusters, why does it say k = 6 in the config element of the list?

asardaes commented 2 years ago

Ah ok, that seems to be a bug in the ordering, I believe the value from object is the correct one. I'll push a fix shortly, do you have R tools installed (or do you use Linux)? If so, you could install remotes and install from github once it's ready.

vidarsumo commented 2 years ago

I have R tools so I can install from github.