gdkrmr / dimRed

A Framework for Dimensionality Reduction in R
https://www.guido-kraemer.com/software/dimred/
GNU General Public License v3.0
73 stars 15 forks source link

Changing @stdpars$knn not reflected by UMAP embedding when using "umap-learn" #43

Closed bksn4xqifa closed 5 years ago

bksn4xqifa commented 5 years ago

With the reference UMAP implementation (umap-learn 0.3.9, py27_0, conda-forge) installed, dimRed (0.2.3, R-) appears to use only the default knn as specified in umap@stdpars.

library(dimRed)

dat <- loadDataSet("3D S Curve", n = 300)

## use the S4 Class directly:
umap <- UMAP()

umap@stdpars
# $knn
# [1] 15
# 
# $ndim
# [1] 2
# 
# $d
# [1] "euclidean"
# 
# $method
# [1] "umap-learn"

emb <- umap@fun(dat, umap@stdpars)
plot(emb)

umap@stdpars$knn <- 30
umap@stdpars
# $knn
# [1] 30
# 
# $ndim
# [1] 2
# 
# $d
# [1] "euclidean"
# 
# $method
# [1] "umap-learn"

emb <- umap@fun(dat, umap@stdpars)
plot(emb) # same plot although it should be different because of change in knn

emb2 <- embed(dat, "UMAP", .mute = NULL, knn = 2, method="naive")
plot(emb2, type = "2vars")

emb2 <- embed(dat, "UMAP", .mute = NULL, knn = 200, method="naive")
plot(emb2, type = "2vars") # same here

sessionInfo()
# R version 3.6.0 (2019-04-26)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 18.04 (Bionic Beaver)
# 
# Matrix products: default
# BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
# LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
# 
# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_AG.UTF-8       
#  [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_AG.UTF-8    LC_MESSAGES=en_US.UTF-8   
#  [7] LC_PAPER=en_AG.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
# [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_AG.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] dimRed_0.2.3   DRR_0.0.3      CVST_0.2-2     Matrix_1.2-17  kernlab_0.9-27
# 
# loaded via a namespace (and not attached):
#  [1] compiler_3.6.0  magrittr_1.5    tools_3.6.0     yaml_2.2.0      reticulate_1.12 Rcpp_1.0.1     
#  [7] RSpectra_0.14-0 grid_3.6.0      jsonlite_1.6    umap_0.2.2.0    lattice_0.20-38
gdkrmr commented 5 years ago

Thanks for finding this. I thought that UMAP was non-deterministic. While I was at it, I also added a predict method for UMAP. You can find the changes in the fix-umap branch, could you please test it and report if everything is working fine. #44

bksn4xqifa commented 5 years ago

I can confirm that dimRed_0.2.3.9001 fixes the issue. Thanks!!!

I haven't tested the predict method, though.

gdkrmr commented 5 years ago

I'll put it into master then. Thanks!