asardaes / dtwclust

R Package for Time Series Clustering Along with Optimizations for DTW
https://cran.r-project.org/package=dtwclust
GNU General Public License v3.0
254 stars 29 forks source link

Multivariate clustering with different distances/prototyping functions for each variable v #29

Closed zdegreve closed 6 years ago

zdegreve commented 6 years ago

Dear Alexis,

I am studying the possibility to perform multivariate time series clustering on variables of different natures. In that context, the best distance to consider is not the same for each variable (the same is valid for prototyping functions).

So my question: is there a straightforward way to create "custom" multivariate distances, for which the distance used for each variable v can change ? Moreover, since the distances for each v may take very different values, I suppose that these should be normalized in a certain way (to be defined) during the computation of the global distance (i.e. the one resulting form the summation over all the v variables).

If there is no simple way to do that, I would be glad to contribute (please just indicate which functions I should focus on).

Of course, the same question is valid for prototyping.

Thanks a lot in advance.

All the best,

Zacharie

asardaes commented 6 years ago

Hello, I think this would be possible, I'm not sure if it would be "easy".

For the centroid, you could see what shape extraction does. The reshape_multivariate helper extracts variables and puts them in lists roughly like this:

mv
 |- series
    |- list1
        |- variable1 from series1
        |- variable1 from series2
        |- ...
    |- list2
        |- variable2 from series1
        |- variable2 from series2
        |- ...
    |- ...
 |- cent
    |- list with variable1 from centroid
    |- list with variable2 from centroid
    |- ...

You could probably do something similar and pass a column index to Map, and apply a different centroid function depending on the index.

I guess it would work similarly for a distance, where you would apply a different distance to list1, list2 and so on. You'd have to register the distance in proxy::pr_DB with loop = FALSE, and then I guess you'd have to calculate a cross-distance matrix for each list of variables and then aggregate the matrices somehow.

asardaes commented 6 years ago

Here's an example of the distance case:

library(dtwclust)

custom_dist <- function(x, y = NULL, ...) {
  x <- dtwclust:::reshape_multivariate(x, NULL)$series

  if (is.null(y))
    y <- x
  else
    y <- dtwclust:::reshape_multivariate(y, NULL)$series

  # 1L:3L hard-coded here because I expect 3 variables per series
  distance_matrices <- Map(x, y, 1L:3L, f = function(x, y, index) {
    switch(index,
           # first variables
           proxy::dist(x, y, method="dtw_basic", normalize=TRUE),
           # second variables
           proxy::dist(x, y, method="gak", sigma=100),
           # third variables
           proxy::dist(x, y, method="sbd")
    )
  })

  # aggregate as simple average
  Reduce("+", distance_matrices) / length(distance_matrices)
}

proxy::pr_DB$set_entry(FUN = custom_dist, names = c("custom"), distance = TRUE, loop = FALSE)

pc <- tsclust(CharTrajMV[1L:10L], k = 2L, distance = "custom", trace = TRUE, seed = 192L)
asardaes commented 6 years ago

For the centroid, I'd start by copying the allcent function and changing the do.call part so that it calls whatever centroids you want to use.

zdegreve commented 6 years ago

Thanks a lot Alexis, I will implement that and keep you informed.

All the best,

Zacharie

--

Zacharie De Grève - PhD in Electrical Engineering Research and Teaching Assistant Electrical Power Engineering Unit (http://www.gele-umons.be/) University of Mons - Faculty of Engineering Bd Dolez, 31 BE-7000 Mons Tél : +32 (0)65 374117 Fax : +32 (0)65 374120 https://sites.google.com/site/zachariedegreve/


De : Alexis Sardá notifications@github.com Envoyé : mardi 27 mars 2018 18:57:59 À : asardaes/dtwclust Cc : Zacharie DE GREVE; Author Objet : Re: [asardaes/dtwclust] Multivariate clustering with different distances/prototyping functions for each variable v (#29)

For the centroid, I'd start by copying the allcenthttps://github.com/asardaes/dtwclust/blob/master/R/CLUSTERING-all-cent2.R#L197 function and changing the do.call part so that it calls whatever centroids you want to use.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/asardaes/dtwclust/issues/29#issuecomment-376598511, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVTw-ccV0VrFdz8cq-XGVEr_wDgisc4Hks5tim-XgaJpZM4S8_ZY.