Closed wirginiad closed 5 years ago
Some distances are not symmetric, and others are only symmetric under certain circumstances. The documentation of the CDM states:
While this dissimilarity is asymptotically symmetric, for short series the differences between
diss.CDM(x,y)
anddiss.CDM(y,x)
may be noticeable.
Many functions assume distances are symmetric, including proxy::dist
when you only pass x
:
library(TSclust)
library(dtwclust)
set.seed(319L)
series <- lapply(1L:4L, function(.) { rnorm(10L, 10, 10) })
proxy::pr_DB$set_entry(FUN=diss.CDM, names="CDMdis", distance=TRUE, loop=TRUE)
dm <- proxy::dist(series, method="CDMdis")
# TRUE
base::isSymmetric(base::as.matrix(dm))
dm <- proxy::dist(series, series, method="CDMdis")
# FALSE
base::isSymmetric(base::as.matrix(dm))
Hierarchical clustering and some CVIs also assume symmetry. For example, hclust
takes a "dist" structure as input, which is essentially the lower triangular with some extra information:
# TRUE
all(as.dist(as.matrix(dm)) == dm[lower.tri(dm)])
So some functions basically ignore information when the distance is not symmetric. Maybe this difference is small, due to numerical precision or the like, and you can ignore it, but you need to be aware of it. Hence the warnings. If those differences shouldn't be ignored, then maybe that distance is not suitable for your data in this case.
Also note that the distances included in dtwclust
have custom proxy
loops, so they don't assume symmetry based on whether only x
or both x
and y
were provided. For example, SBD is always symmetric, but it is never safe to assume that lb_keogh
or lb_improved
are symmetric, so something like proxy::dist(series, method="lb_keogh", window.size=1L)
will always calculate the whole matrix, not just the lower triangular.
Thank you for your help and sorry for the misleading title (I started with an error, which I fixed). The CDM measure was suggested by some research as most suitable one for macroeconomic data and I use it as a kind of robustness check. I didn't know that some CVIs require symmetry. I guess I must delve into CVIs more deeply.
Hi, I am trying to use CDMDistance TSclust "CDMdistance" and to compare cvis for different number of clusters. After
proxy::dist(data, method = "CDMdis")
p1<-tsclust(data, type="hierarchical",k=2:5, distance="CDMdis", control=hierarchical_control(method="ward.D")
I get warning Distance matrix is not symmetric, and hierarchical clustering assumes it is (it ignores the upper triangular). Aftersapply(p1, cvi, type = "internal")
the indices are provided, but there are: Warning messages: 1: In FUN(X[[i]], ...) : Internal CVIs: series' cross-distance matrix is NOT symmetric, which can be problematic for: Sil D COP I guess there is something I am doing wrong. I'd be grateful if you could help me.