elbamos / largeVis

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R
340 stars 63 forks source link

BuildWijMatrix fails: invalid row or column index #43

Closed billytcl closed 7 years ago

billytcl commented 7 years ago

R 3.2.5; latest version of largeVis; Ubuntu 12.04

Running largeVis by itself and its step-by-step components doesn't work. I narrowed it down to the BuildWijMatrix step failing, but don't know how to proceed.

sessionInfo() R version 3.2.5 (2016-04-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu precise (12.04.5 LTS)

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] largeVis_0.2 Matrix_1.2-8 Rcpp_0.12.10

loaded via a namespace (and not attached): [1] colorspace_1.3-2 scales_0.4.1 assertthat_0.1 lazyeval_0.2.0
[5] plyr_1.8.4 tools_3.2.5 gtable_0.2.0 tibble_1.2
[9] ggplot2_2.2.1 grid_3.2.5 munsell_0.4.3 lattice_0.20-34

dim(as.matrix(seurat_sw480@data)) [1] 15843 1691 neighbors <- randomProjectionTreeSearch(as.matrix(seurat_sw480@data), n_trees = 5, max_iter = 1, verbose=T) Searching for neighbors. 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| **| edges <- buildEdgeMatrix(data = as.matrix(seurat_sw480@data), neighbors = neighbors)
gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1124907 60.1 1770749 94.6 1770749 94.6 Vcells 118347552 903.0 196989268 1503.0 187146708 1427.9 rm(neighbors) wij <- buildWijMatrix(edges)

error: SpMat::SpMat(): invalid row or column index Error in referenceWij(is, x@i, x@x^2, as.integer(threads), perplexity) : SpMat::SpMat(): invalid row or column index

elbamos commented 7 years ago

Well that definitely shouldn't be happening. Can you make that dataset available and I'll take a look?

On Mar 29, 2017, at 6:12 PM, billytcl notifications@github.com wrote:

R 3.2.5; latest version of largeVis; Ubuntu 12.04

Running largeVis by itself and its step-by-step components doesn't work. I narrowed it down to the BuildWijMatrix step failing, but don't know how to proceed.

sessionInfo() R version 3.2.5 (2016-04-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu precise (12.04.5 LTS)

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] largeVis_0.2 Matrix_1.2-8 Rcpp_0.12.10

loaded via a namespace (and not attached): [1] colorspace_1.3-2 scales_0.4.1 assertthat_0.1 lazyeval_0.2.0 [5] plyr_1.8.4 tools_3.2.5 gtable_0.2.0 tibble_1.2 [9] ggplot2_2.2.1 grid_3.2.5 munsell_0.4.3 lattice_0.20-34

dim(as.matrix(seurat_sw480@data)) [1] 15843 1691 neighbors <- randomProjectionTreeSearch(as.matrix(seurat_sw480@data), n_trees = 5, max_iter = 1, verbose=T) Searching for neighbors. 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| **| edges <- buildEdgeMatrix(data = as.matrix(seurat_sw480@data), neighbors = neighbors) gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1124907 60.1 1770749 94.6 1770749 94.6 Vcells 118347552 903.0 196989268 1503.0 187146708 1427.9 rm(neighbors) wij <- buildWijMatrix(edges)

error: SpMat::SpMat(): invalid row or column index Error in referenceWij(is, x@i, x@x^2, as.integer(threads), perplexity) : SpMat::SpMat(): invalid row or column index

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

billytcl commented 7 years ago

Here's a link to the as.matrix(seurat_sw480@data). The var name is now "mat".

https://www.dropbox.com/s/g8qgps5eatc9o27/seurat_sw480_matrix.RData?dl=0

billytcl commented 7 years ago

Hi there -- any updates?

elbamos commented 7 years ago

It'll be a few days before I have a chance to look at it. I'll let you know.

On Mar 31, 2017, at 1:15 AM, billytcl notifications@github.com wrote:

Hi there -- any updates?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

elbamos commented 7 years ago

In your matrix, are the examples rows or columns?

largeVis requires that the examples be columns and the features, rows. I then don't reproduce the issue:

screen shot 2017-04-01 at 5 02 31 pm

billytcl commented 7 years ago

The examples (cells) are columns and features (genes) are rows. I think there may be a memory issue because I got it to work when only using 500 columns instead of the full 1691. Is there any way to reduce usage more than what the tutorial recommends?

On Sat, Apr 1, 2017 at 5:08 PM elbamos notifications@github.com wrote:

In your matrix, are the examples rows or columns?

largeVis requires that the examples be columns and the features, rows. I then don't reproduce the issue:

[image: screen shot 2017-04-01 at 5 02 31 pm] https://cloud.githubusercontent.com/assets/10103420/24582408/59b92ff2-16fd-11e7-888c-f9a307767903.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/elbamos/largeVis/issues/43#issuecomment-290947367, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ58T43EQst-I428rmD3Upw1SgZUUd96ks5rrrzJgaJpZM4Mtnzx .

elbamos commented 7 years ago

Its not a memory issue.

So are there 15,000 examples and 1600 features, or 15,000 features and 1600 examples?

billytcl commented 7 years ago

Hi, there are 15,000 features and 1600 examples.

Billy

On Sat, Apr 1, 2017 at 5:28 PM elbamos notifications@github.com wrote:

Its not a memory issue.

So are there 15,000 examples and 1600 features, or 15,000 features and 1600 examples?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/elbamos/largeVis/issues/43#issuecomment-290948330, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ58T3_85BN1sLhnfaJHyjLCsH-XdTE9ks5rrsGBgaJpZM4Mtnzx .

elbamos commented 7 years ago

Ah. I think I may know what's going on. I'll check this weekend.

On Apr 1, 2017, at 7:40 PM, billytcl notifications@github.com wrote:

Hi, there are 15,000 features and 1600 examples.

Billy

On Sat, Apr 1, 2017 at 5:28 PM elbamos notifications@github.com wrote:

Its not a memory issue.

So are there 15,000 examples and 1600 features, or 15,000 features and 1600 examples?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/elbamos/largeVis/issues/43#issuecomment-290948330, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ58T3_85BN1sLhnfaJHyjLCsH-XdTE9ks5rrsGBgaJpZM4Mtnzx .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

elbamos commented 7 years ago

@billytcl I think I've tracked this down. Thank you for posting it! It seems to be a subtle bug that creeped in late in the last update. It will take a few days to get a working fix. I will let you know.

Anyway, your data looks like this:

screen shot 2017-04-03 at 3 33 39 am

elbamos commented 7 years ago

@billytcl Try the version in branch hotfix/twobugs

billytcl commented 7 years ago

Thanks!! Will give it a shot.

billytcl commented 7 years ago

Looks like it's working! Thanks!!

elbamos commented 7 years ago

Cool!

Btw, with datasets like the one you sent me, i think you'll be a lot happier if you convert the data to a sparse matrix. If you do that, then it'll scale to very, very, very large datasets before you hit a memory issue.

On Apr 4, 2017, at 5:28 PM, billytcl notifications@github.com wrote:

Looks like it's working! Thanks!!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

elbamos commented 7 years ago

I'm going to close this now. Please let me know if anything else comes up, and thank you for the report!