[Feature Request] Implementation of Distance Matrix Computation and Hierarchichal Clustering Routines in GPU

First of all thank you for the great work with this package. I do hope it becomes the standard routine for basic GPU matrix calculations in R as this is the most recent updated package being actively developed.

Upon perusal of the overloaded functions it seems the distance matrix computation routines are not implemented (yet?). A majority of the old GPU R packages have included this (gpuR, gmatrix, gputools). Even gputools also included routines for hierarchical clustering in addition to the distance matrix computations.

Please advise if this is a reasonable feature request and if there any technical issues in trying to implement this (with the Torch or Tensorflow backends) for complete feature parity in comparison with the old GPU R packages. Thank You.

Thanks for your flattering comment.

We did not implement the dist function, but does not seem to be a difficult issue: torch for R includes the flexible torch_cdist function. It is not hard for us to implement it in the package.

For the time being, a possible replace is the following:


# A is assumed to be gpu.matrx objects
mydist <- function(A,method = "euclidean", diag = FALSE, upper = FALSE, p = 2) {
  if (!is.na(pmatch(method, "euclidian")))
    method <- "euclidean"
  METHODS <- c("euclidean", "maximum", "manhattan", "minkowski")
  method <- pmatch(method, METHODS)
  p <- (method == 1)*2 + (method==3)*1+(method==4)*p
  if(method==2) p <- Inf
  if (is.na(method))
    stop("invalid distance method")

  output <- torch::torch_cdist(A@gm, A@gm,p)
  return(gpu.matrix(output))
}

A <- matrix(rnorm(3*5),3,5)
GA <- gpu.matrix(A)
dist(A)
mydist(GA)

dist(A,"maximum")
mydist(GA,"maximum")

dist(A,"manhattan")
mydist(GA,"manhattan")

The difference in speed is really large. For a 1000 x 1000 matrix, dist takes 2.4 seconds and mydist takes 0.680 millisecs!

ceslobfer / GPUmatrix

[Feature Request] Implementation of Distance Matrix Computation and Hierarchichal Clustering Routines in GPU #3