asardaes / dtwclust

R Package for Time Series Clustering Along with Optimizations for DTW
GNU General Public License v3.0
252 stars 29 forks source link

Mean and SD not keeped when using DBA centroids in tsclust #46

Closed lucazav closed 4 years ago

lucazav commented 4 years ago

If I run the following code using a list of multivariate time series:

data <- zscore(my_list, keep.attributes = TRUE)

pc_dtw_dba <- tsclust(data, k = 2L:10L,
                    distance = "dtw_basic", centroid = "dba",
                    trace = trace, seed = seed,
                    norm = "L2",
                    args = tsclust_args(cent = list(trace = trace)) )

names(pc_dtw_dba) <- paste0("k_", 2L:10L)

centroids don't keep the mean and sd attributes:

attr( pc_dtw_dba$k_10@centroids[[1]], "scaled:scale" )
asardaes commented 4 years ago

zscore normalizes each series separately, so the returned values are only valid for the input series individually. DBA modifies the series, so the mean/SD values are no longer valid for the output; if each input series has their own mean/SD values, which ones should be kept for a DBA output that considered, say, 5 of them as input, with 5 different mean/SD pairs? In your case, I imagine you'd have to normalize all series with a given set of values chosen by you (it doesn't have to be z-normalization).