asardaes / dtwclust

R Package for Time Series Clustering Along with Optimizations for DTW
https://cran.r-project.org/package=dtwclust
GNU General Public License v3.0
252 stars 29 forks source link

Mean and SD not keeped when using DBA centroids in tsclust #46

Closed lucazav closed 4 years ago

lucazav commented 4 years ago

If I run the following code using a list of multivariate time series:

data <- zscore(my_list, keep.attributes = TRUE)

pc_dtw_dba <- tsclust(data, k = 2L:10L,
                    distance = "dtw_basic", centroid = "dba",
                    trace = trace, seed = seed,
                    norm = "L2",
                    args = tsclust_args(cent = list(trace = trace)) )

names(pc_dtw_dba) <- paste0("k_", 2L:10L)

centroids don't keep the mean and sd attributes:

attr( pc_dtw_dba$k_10@centroids[[1]], "scaled:scale" )
# NULL
asardaes commented 4 years ago

zscore normalizes each series separately, so the returned values are only valid for the input series individually. DBA modifies the series, so the mean/SD values are no longer valid for the output; if each input series has their own mean/SD values, which ones should be kept for a DBA output that considered, say, 5 of them as input, with 5 different mean/SD pairs? In your case, I imagine you'd have to normalize all series with a given set of values chosen by you (it doesn't have to be z-normalization).