Closed zdegreve closed 6 years ago
Hello, I think this would be possible, I'm not sure if it would be "easy".
For the centroid, you could see what shape extraction does. The reshape_multivariate
helper extracts variables and puts them in lists roughly like this:
mv
|- series
|- list1
|- variable1 from series1
|- variable1 from series2
|- ...
|- list2
|- variable2 from series1
|- variable2 from series2
|- ...
|- ...
|- cent
|- list with variable1 from centroid
|- list with variable2 from centroid
|- ...
You could probably do something similar and pass a column index to Map
, and apply a different centroid function depending on the index.
I guess it would work similarly for a distance, where you would apply a different distance to list1
, list2
and so on. You'd have to register the distance in proxy::pr_DB
with loop = FALSE
, and then I guess you'd have to calculate a cross-distance matrix for each list of variables and then aggregate the matrices somehow.
Here's an example of the distance case:
library(dtwclust)
custom_dist <- function(x, y = NULL, ...) {
x <- dtwclust:::reshape_multivariate(x, NULL)$series
if (is.null(y))
y <- x
else
y <- dtwclust:::reshape_multivariate(y, NULL)$series
# 1L:3L hard-coded here because I expect 3 variables per series
distance_matrices <- Map(x, y, 1L:3L, f = function(x, y, index) {
switch(index,
# first variables
proxy::dist(x, y, method="dtw_basic", normalize=TRUE),
# second variables
proxy::dist(x, y, method="gak", sigma=100),
# third variables
proxy::dist(x, y, method="sbd")
)
})
# aggregate as simple average
Reduce("+", distance_matrices) / length(distance_matrices)
}
proxy::pr_DB$set_entry(FUN = custom_dist, names = c("custom"), distance = TRUE, loop = FALSE)
pc <- tsclust(CharTrajMV[1L:10L], k = 2L, distance = "custom", trace = TRUE, seed = 192L)
For the centroid, I'd start by copying the allcent
function and changing the do.call
part so that it calls whatever centroids you want to use.
Thanks a lot Alexis, I will implement that and keep you informed.
All the best,
Zacharie
De : Alexis Sardá notifications@github.com Envoyé : mardi 27 mars 2018 18:57:59 À : asardaes/dtwclust Cc : Zacharie DE GREVE; Author Objet : Re: [asardaes/dtwclust] Multivariate clustering with different distances/prototyping functions for each variable v (#29)
For the centroid, I'd start by copying the allcenthttps://github.com/asardaes/dtwclust/blob/master/R/CLUSTERING-all-cent2.R#L197 function and changing the do.call part so that it calls whatever centroids you want to use.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/asardaes/dtwclust/issues/29#issuecomment-376598511, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVTw-ccV0VrFdz8cq-XGVEr_wDgisc4Hks5tim-XgaJpZM4S8_ZY.
Dear Alexis,
I am studying the possibility to perform multivariate time series clustering on variables of different natures. In that context, the best distance to consider is not the same for each variable (the same is valid for prototyping functions).
So my question: is there a straightforward way to create "custom" multivariate distances, for which the distance used for each variable v can change ? Moreover, since the distances for each v may take very different values, I suppose that these should be normalized in a certain way (to be defined) during the computation of the global distance (i.e. the one resulting form the summation over all the v variables).
If there is no simple way to do that, I would be glad to contribute (please just indicate which functions I should focus on).
Of course, the same question is valid for prototyping.
Thanks a lot in advance.
All the best,
Zacharie