elbamos / largeVis

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R
340 stars 63 forks source link

error: SpMat::SpMat(): invalid row or column index #62

Open Shawnmhy opened 2 years ago

Shawnmhy commented 2 years ago

Hi there, I am trying to use largevis to do clustering. I have about ~200 dataset, each dataset has ~ 1000 - 100000 samples with 2 features (feature number is consistent). While the largevis function works for almost all my dataset, I still got this error message for one of my dataset:


error: SpMat::SpMat(): invalid row or column index
Error in referenceWij(is, x@i, x@x^2, as.integer(threads), perplexity) : 
  SpMat::SpMat(): invalid row or column index
In addition: Warning message:
In largeVis(t(as.matrix(memberships[, c("X", "Y")])), dim = 2, K = K,  :
  The Distances between some neighbors are large enough to cause the calculation of p_{j|i} to overflow. Scaling the distance vector.

I realized that someone had such problem before, and the solution is to install the branch 'hotfix/twobugs', I successfully installed this version as well but no luck. Any ideas? Thanks!

The dataset is here: data.csv

The function I run is: largeVis(t(as.matrix(data[, c('X', 'Y')])), dim=2, K = K, tree_threshold = 100, max_iter = 5,sgd_batches = 1, threads = 1)

elbamos commented 2 years ago

Hi Shawn...

The most recent branch is feature/backoncran. I just tried this with your data and code and, with some changes for parameters that have been removed from the functions, it ran perfectly.

I note, though, that your dataset has only two input features, and your code would generate a dataset with two output features. LargeVis is a method for dimensionality reduction. Since your data only has two features, I'm not sure what benefit you would obtain by running it through LargeVis. Is your goal to take advantage of the hd clustering features of the package? If so, considering your datasize, you may be better off using the dbscan package.

Shawnmhy commented 2 years ago

Hi elbamos, thank you for your reply. I tried to install this most recent branch but got an error:

remotes::install_github('elbamos/largeVis@feature/backoncran')
Downloading GitHub repo elbamos/largeVis@feature/backoncran
Error: Failed to install 'largeVis' from GitHub:
  Incorrect number of arguments (14), expecting 16 for 'processx_exec'

Any ideas?

The reason I am using the dbscan clustering from largeVis is I find that the clusters generated from is more 'realistic' (in my analysis context) than the dbscan package.

elbamos commented 2 years ago

Huh... I just tried cutting and pasteing your install_github line and it worked properly. I suggest making sure you're using the current version of remotes and related packages and checking your setup.