Closed paco-ceam closed 6 years ago
Unfortunately that's in the area of "no best approach exists, no way to tell what's objectively best". You can try to do some "voting" among the indexes, or just choose a subset and base your decision on that, you could use a more interactive approach (see the ssdtwclust
app). There are simpler ways too, check this answer.
Even choosing a subset of CVIs to work with is something I can't help you with. I'd have to read the associated paper for each one and then decide which one might work best for my goal. You might have to do just that (the main references are in the documentation of the cvi
function).
What I mean with voting is something like
cvis <- sapply(pc_dtw.max, cvi, type=c("DB", "DBstar"))
apply(cvis, 1L, which.min)
# ^^ now you have two votes, DB and DB*, each suggesting a certain number of clusters
As a side note: I think that the SF index only works well with distances that are normalized (the distance itself, not the time series).
Hi Alexis, I've had also read the Stack Overflow answer. Well, I'll read some of the references and try to decide. Thanks.
Hi Alexis and thanks for dtwclust package
I'm trying to cluster a set of 115 temperature series with dtwclust but I'm not sure how to choose the optimum number or cluster and clusterting method. I have tried partitional (as seen in an example) with a predefined number of clusters.
As not an expert I have tried changing some parameters from reading dtwclust documentation but could not find big differences in the results. Now I'm trying to run cvi for a sample of different number of clusters to see which nclusters parameter is "better".
Try
sapply(pc_dtw.max, cvi, type="internal")
whit this output
but can't find out how to manage all these indexes. Should I look for the absolute lowest value between all indexes and choose the associated number of clusters? Are "Sil" negative values meaningless? Or should I look for the n clusters with more lower values from all indexes?
Thanks and best regards