Closed kahaaga closed 1 year ago
Actually, this measure also comes in a conditional version, with an explicit conditional independence test (https://www.tandfonline.com/doi/full/10.1080/01621459.2014.993081?casa_token=KcRYvET0K2IAAAAA%3AlsLIH_bGiF2hwbgjQANQ7D-RxDBQzS-BWMqDi2BQJCd0IVgL_mj2RZ9BUiBuHVxnFDfGIrZO14Uj).
Thus, I think it belongs in CausalityTools after all, so I'm closing this for now. The issue can be reopened if there are very good reasons for putting it here instead.
Describe the feature you'd like to have
Lately, I encountered a situation where I needed to compute the similarity between two datasets, where each can be of arbitrary dimension. That led me to the distance correlation metric.
I see that this package already offers some ways of computing distances between datasets. Would it be appropriate to offer the distance correlation here too?
Disclaimer: I'm offering this as a utility method in the upcoming v2 release of CausalityTools, but I thought it might be of more widespread use here. Not sure where it belongs though. It kind of fits in with mutual information, but then again, it isn't entropy-based, like all the other information measures there.
Cite scientific papers related to the feature/algorithm
The algorithm was introduced in Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6), 2769-2794.
If possible, sketch out an implementation strategy
Here's a simple, non-optimized implementation that only depends on
Distances.jl
, that passes analytical tests and comparison with theenergy
-package in R (by the method authors):And some tests: