Open mjmg opened 1 year ago
Thanks for your flattering comment.
We did not implement the dist function, but does not seem to be a difficult issue: torch for R includes the flexible torch_cdist function. It is not hard for us to implement it in the package.
For the time being, a possible replace is the following:
# A is assumed to be gpu.matrx objects
mydist <- function(A,method = "euclidean", diag = FALSE, upper = FALSE, p = 2) {
if (!is.na(pmatch(method, "euclidian")))
method <- "euclidean"
METHODS <- c("euclidean", "maximum", "manhattan", "minkowski")
method <- pmatch(method, METHODS)
p <- (method == 1)*2 + (method==3)*1+(method==4)*p
if(method==2) p <- Inf
if (is.na(method))
stop("invalid distance method")
output <- torch::torch_cdist(A@gm, A@gm,p)
return(gpu.matrix(output))
}
A <- matrix(rnorm(3*5),3,5)
GA <- gpu.matrix(A)
dist(A)
mydist(GA)
dist(A,"maximum")
mydist(GA,"maximum")
dist(A,"manhattan")
mydist(GA,"manhattan")
The difference in speed is really large. For a 1000 x 1000 matrix, dist takes 2.4 seconds and mydist takes 0.680 millisecs!
First of all thank you for the great work with this package. I do hope it becomes the standard routine for basic GPU matrix calculations in R as this is the most recent updated package being actively developed.
Upon perusal of the overloaded functions it seems the distance matrix computation routines are not implemented (yet?). A majority of the old GPU R packages have included this (gpuR, gmatrix, gputools). Even gputools also included routines for hierarchical clustering in addition to the distance matrix computations.
Please advise if this is a reasonable feature request and if there any technical issues in trying to implement this (with the Torch or Tensorflow backends) for complete feature parity in comparison with the old GPU R packages. Thank You.