asardaes / dtwclust

R Package for Time Series Clustering Along with Optimizations for DTW
https://cran.r-project.org/package=dtwclust
GNU General Public License v3.0
258 stars 29 forks source link

A question about 'pam' in dtwclust('fuzzy'). #6

Closed Wang-Yu-Qing closed 8 years ago

Wang-Yu-Qing commented 8 years ago

When use centroid calculation of ''pam'', the centers of each cluster are supposed to be one of those series in the cluster. However, when I tested this:

a<-list()
for (i in 1:5) {a[[i]]<-round(rnorm(10,5,5),digits = 0)}      
c<-dtwclust(a,'fuzzy',k=3,distance = 'dtw','pam',seed=100)
c@centers

The centers are seemly means of the clusters rather than one of the series. Additionally, when I tried k=2 in c<-dtwclust(a,'fuzzy',k=2,distance = 'dtw','pam',seed=100), the response is Error in apply(cluster[, -1L], 1L, sum) : dim(X) must have a positive length. I don't understand why these happen.

asardaes commented 8 years ago

The fuzzy clustering algorithm that's implemented is fuzzy c-means. Technically, you can change the distance function, but the centroid function does not change. Actually, c@centroid should contain fcm, since it overrides any user-provided arguments. The error you're getting with k=2 is indeed a problem in the code, I'll fix it.

I'm not too familiar with fuzzy clustering, but at least in what I've read, people usually don't cluster time series directly, but use some kind of transformation first. The centroid function used by fuzzy c-means is like a weighted average, and I'm not sure it could be changed without altering the clustering algorithm significantly.