elbamos / largeVis

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R
340 stars 63 forks source link

caught segfault : 'memory not mapped' #45

Closed NagaComBio closed 7 years ago

NagaComBio commented 7 years ago

Got the following segfault error with largeVis, as the 'bench' branch was not available, have recompiled from github/master without OpenMP as suggested in here.

For compiling without OpenMP I made the Makevars file as follows,

PKG_LIBS = $(FLIBS) $(LAPACK_LIBS) $(BLAS_LIBS)
PKG_CXXFLAGS = -DARMA_64BIT_WORD -DNDEBUG
CXX_STD=CXX11
LDFLAGS = $(LDFLAGS)

and compiled as

R-3.3.1 CMD INSTALL largeVis-master

The error message :

> library(largeVis)
Loading required package: Rcpp
Loading required package: Matrix

Attaching package: ‘Matrix’

The following object is masked from ‘package:tidyr’:

    expand

largeVis was compiled without OpenMP support.
> neig<-randomProjectionTreeSearch(t(dat.small.matrix), K=10, tree_threshold = 100, max_iter = 15, n_trees = 10)

 *** caught segfault ***
address 0x75a8, cause 'memory not mapped'

Traceback:
 1: .Call("largeVis_searchTrees", PACKAGE = "largeVis", threshold,     n_trees, K, maxIter, data, distMethod, seed, threads, verbose)
 2: searchTrees(threshold = as.integer(tree_threshold), n_trees = as.integer(n_trees),     K = as.integer(K), maxIter = as.integer(max_iter), data = x,     distMethod = as.character(distance_method), seed = seed,     threads = threads, verbose = as.logical(verbose))
 3: randomProjectionTreeSearch.matrix(t(dat.small.matrix), K = 10,     tree_threshold = 100, max_iter = 15, n_trees = 10)
 4: randomProjectionTreeSearch(t(dat.small.matrix), K = 10, tree_threshold = 100,     max_iter = 15, n_trees = 10)

Maybe I didn't compile it properly since the error still occurs in the 'multiprocessing step'.

elbamos commented 7 years ago

Can you share the dataset and your system configuration?

On Apr 21, 2017, at 8:56 AM, Nagarajan Paramasivam notifications@github.com wrote:

Got the following segfault error with largeVis, as the 'bench' branch was not available, have recompiled from github/master without OpenMP as suggested in here.

For compiling without OpenMP I made the Makevars file as follows,

PKG_LIBS = $(FLIBS) $(LAPACK_LIBS) $(BLAS_LIBS) PKG_CXXFLAGS = -DARMA_64BIT_WORD -DNDEBUG CXX_STD=CXX11 LDFLAGS = $(LDFLAGS) and compiled as

R-3.3.1 CMD INSTALL largeVis-master The error message :

library(largeVis) Loading required package: Rcpp Loading required package: Matrix

Attaching package: ‘Matrix’

The following object is masked from ‘package:tidyr’:

expand

largeVis was compiled without OpenMP support.

neig<-randomProjectionTreeSearch(t(dat.small.matrix), K=10, tree_threshold = 100, max_iter = 15, n_trees = 10)

caught segfault address 0x75a8, cause 'memory not mapped'

Traceback: 1: .Call("largeVis_searchTrees", PACKAGE = "largeVis", threshold, n_trees, K, maxIter, data, distMethod, seed, threads, verbose) 2: searchTrees(threshold = as.integer(tree_threshold), n_trees = as.integer(n_trees), K = as.integer(K), maxIter = as.integer(max_iter), data = x, distMethod = as.character(distance_method), seed = seed, threads = threads, verbose = as.logical(verbose)) 3: randomProjectionTreeSearch.matrix(t(dat.small.matrix), K = 10, tree_threshold = 100, max_iter = 15, n_trees = 10) 4: randomProjectionTreeSearch(t(dat.small.matrix), K = 10, tree_threshold = 100, max_iter = 15, n_trees = 10)

Maybe I didn't compile it properly since the error still occurs in the 'multiprocessing step'.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

NagaComBio commented 7 years ago

Here is sample random data, it crashed for this data. But ran perfectly for the original data in Mac OS, but replicating it in Linux got into the issue.

dat<-data.frame(x=rnorm(n=10000, mean=0.5, sd=0.1), y=rnorm(n=1000, mean=0.5, 0.1)) %>% 
  rbind(data.frame(x=rnorm(n=1000, mean=0.25, sd=0.05), y=rnorm(n=1000, mean=0.25, 0.05))) %>% 
  rbind(data.frame(x=rnorm(n=1000, mean=0.15, sd=0.05), y=rnorm(n=1000, mean=0.5, 0.05)))

dat.small.matrix <- dat %>% as.matrix.noquote() %>% apply(2, as.numeric)
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             8
NUMA node(s):          8
Vendor ID:             AuthenticAMD
CPU family:            16
Model:                 4
Model name:            Quad-Core AMD Opteron(tm) Processor 8384
Stepping:              2
CPU MHz:               2693.060
BogoMIPS:              5385.73
Virtualization:        AMD-V
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              6144K
NUMA node0 CPU(s):     0-3
NUMA node1 CPU(s):     4-7
NUMA node2 CPU(s):     8-11
NUMA node3 CPU(s):     12-15
NUMA node4 CPU(s):     16-19
NUMA node5 CPU(s):     20-23
NUMA node6 CPU(s):     24-27
NUMA node7 CPU(s):     28-31
elbamos commented 7 years ago

No, I'm asking for the actual data that fails, not for code to generate data that might fail. I'm also not sure what the second line of that code is intended to do. Also the description of your system needs to include, e.g., os and r version, etc. also I'm not sure what version of largeVis you were even using - your post refers to a version from more than a year ago.

On Apr 21, 2017, at 12:17 PM, Nagarajan Paramasivam notifications@github.com wrote:

Here is sample random data, it crashed for this data. But ran perfectly for the original data in Mac OS, but replicating it in Linux got into the issue.

dat<-data.frame(x=rnorm(n=10000, mean=0.5, sd=0.1), y=rnorm(n=1000, mean=0.5, 0.1)) %>% rbind(data.frame(x=rnorm(n=1000, mean=0.25, sd=0.05), y=rnorm(n=1000, mean=0.25, 0.05))) %>% rbind(data.frame(x=rnorm(n=1000, mean=0.15, sd=0.05), y=rnorm(n=1000, mean=0.5, 0.05)))

dat.small.matrix <- dat %>% as.matrix.noquote() %>% apply(2, as.numeric) Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 8 NUMA node(s): 8 Vendor ID: AuthenticAMD CPU family: 16 Model: 4 Model name: Quad-Core AMD Opteron(tm) Processor 8384 Stepping: 2 CPU MHz: 2693.060 BogoMIPS: 5385.73 Virtualization: AMD-V L1d cache: 64K L1i cache: 64K L2 cache: 512K L3 cache: 6144K NUMA node0 CPU(s): 0-3 NUMA node1 CPU(s): 4-7 NUMA node2 CPU(s): 8-11 NUMA node3 CPU(s): 12-15 NUMA node4 CPU(s): 16-19 NUMA node5 CPU(s): 20-23 NUMA node6 CPU(s): 24-27 NUMA node7 CPU(s): 28-31 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

NagaComBio commented 7 years ago

Ok, I have tested the sample data before posting it here and mentioned it also failed, here is the original data set. And the R session info.

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: openSUSE 13.1 (Bottle) (x86_64)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] largeVis_0.2    Matrix_1.2-7.1  Rcpp_0.12.10    dplyr_0.5.0     purrr_0.2.2     readr_1.1.0     tidyr_0.6.0     tibble_1.2     
 [9] ggplot2_2.2.1   tidyverse_1.1.1

loaded via a namespace (and not attached):
 [1] xml2_1.1.1       magrittr_1.5     hms_0.3          rvest_0.3.2      mnormt_1.5-5     munsell_0.4.3    colorspace_1.3-2 lattice_0.20-34 
 [9] R6_2.2.0         httr_1.2.1       stringr_1.1.0    plyr_1.8.4       tools_3.3.1      parallel_3.3.1   grid_3.3.1       broom_0.4.2     
[17] nlme_3.1-128     gtable_0.2.0     psych_1.7.3.21   DBI_0.6          modelr_0.1.0     readxl_0.1.1     lazyeval_0.2.0   assertthat_0.1  
[25] reshape2_1.4.2   haven_1.0.0      stringi_1.1.2    forcats_0.2.0    scales_0.4.1     lubridate_1.6.0  jsonlite_1.3     foreign_0.8-67  
idroz commented 7 years ago

Encountered a similar issue on ubuntu trusty:

c <- largeVis(t(data.matrix(iris[,1:4])))
*** caught segfault ***
address 0x7f8, cause 'memory not mapped'
Traceback:
1: .Call("largeVis_searchTrees", PACKAGE = "largeVis", threshold,     n_trees, K, maxIter, data, distMethod, seed, threads, verbose)

2: searchTrees(threshold = as.integer(tree_threshold), n_trees = as.integer(n_trees),     K = as.integer(K), maxIter = as.integer(max_iter), data = x,     distMethod = as.character(distance_method), seed = seed,     threads = threads, verbose = as.logical(verbose))

3: randomProjectionTreeSearch.matrix(x, n_trees = n_trees, tree_threshold = tree_threshold, K = K, max_iter = max_iter, distance_method = distance_method,     threads, verbose = verbose)

4: randomProjectionTreeSearch(x, n_trees = n_trees, tree_threshold = tree_threshold,     K = K, max_iter = max_iter, distance_method = distance_method,     threads, verbose = verbose)

5: largeVis(t(data.matrix(iris[, 1:4])))

Tested with and without OpenMP flag - error still persists. Seems to be Linux-specific as works perfectly fine on a mac.

And sessionInfo():

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] largeVis_0.2  Matrix_1.2-10 Rcpp_0.12.10

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2 scales_0.4.1     compiler_3.4.0   lazyeval_0.2.0
 [5] plyr_1.8.4       gtable_0.2.0     tibble_1.3.0     ggplot2_2.2.1
 [9] grid_3.4.0       munsell_0.4.3    lattice_0.20-35
elbamos commented 7 years ago

@idroz Thanks for that. Could I ask you for a couple of small things?

First is, can you try with R 3.3 on the same system and see if you see the error?

Second is, can you try with the branch that's up here as release/0.2.1?

R 3.4 has changed a bunch of things in how packages with C++ code need to integrate with R, and its really complicated testing for release.

idroz commented 7 years ago

Thanks for that. Tried with R 3.3.1 and release/0.2.1 as well as master branch, unfortunately segfault persists. Will try to play around with Makevars to see if any flags might be contributing to this issue.

elbamos commented 7 years ago

Thanks, @idroz, working on it, appreciate your reporting.

elbamos commented 7 years ago

I'm able to reproduce. I'll try to push an update out soon. It probably is tied to a compiler setting; if you make any progress on that front let me know.

elbamos commented 7 years ago

Very strange... I can reproduce this on my AWS box but not on my linux box at home, both running 16.04.

idroz commented 7 years ago

I managed to get it to work on my 14.04 box. I had to upgrade gcc/g++/gfortran to version 5 and recompile the package. Had versions 4.9 running before that. @elbamos, wonder if you have a similar situation on your AWS box vs home linux?

My /etc/R/Makeconf file had the following changes made to it:

CC = gcc-5 -std=gnu99
CXX = g++-5
CXX1X = g++-5

If you want to change default gcc and g++ to version 5, do:

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5

elbamos commented 7 years ago

It would not be shocking if the problem is old gcc since I make a lot of use of c++11. But it definitely worked in that configuration recently. Are you seeing test failures in any configuration?

On May 2, 2017, at 8:38 AM, Ignat Drozdov notifications@github.com wrote:

I managed to get it to work on my 14.04 box. I had to upgrade gcc/g++/lgfortran to version 5 and recompile the package. Had versions 4.9 running before that. @elbamos, wonder if you have a similar situation on your AWS box vs home linux?

My /etc/R/Makeconf file had the following changes made to it:

CC = gcc-5 -std=gnu99 CXX = g++-5 CXX1X = g++-5 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

idroz commented 7 years ago

Not seeing any test failures. Works perfectly fine from a fresh CRAN install on Ubuntu 16.04 with default gcc/g++ version 5.4.0.

elbamos commented 7 years ago

Yeah I'm real suspicious of gcc right now. Can you confirm what version was generating the crash?

I've had a lot of issues with gcc because the support for c++11 was really incomplete for a long time. I thought that was all fixed by 4.9 though, and I think 4.9 is still common enough I'm going to have to support it.

On May 3, 2017, at 7:58 AM, Ignat Drozdov notifications@github.com wrote:

Not seeing any test failures. Works perfectly fine from a fresh CRAN install on Ubuntu 16.04 with default gcc/g++ version 5.4.0.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

elbamos commented 7 years ago

@idroz I'm was able to reproduce this on my aws box once but, no longer. Are you able to isolate the issue to gcc 4.9?

idroz commented 7 years ago

I'm getting consistent 'memory not mapped' with gcc/g++ 4.8.4.

≥4.9 is OK.

elbamos commented 7 years ago

I expect it to fail on 4.8. Do we know what compiler was in use when you got the segfaults?

On May 4, 2017, at 3:40 AM, Ignat Drozdov notifications@github.com wrote:

I'm getting consistent 'memory not mapped' with gcc/g++ 4.8.4.

≥4.9 is OK.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

idroz commented 7 years ago

It was g++

elbamos commented 7 years ago

But which version?

On May 4, 2017, at 9:28 AM, Ignat Drozdov notifications@github.com wrote:

It was g++

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

idroz commented 7 years ago

4.8.4

elbamos commented 7 years ago

That's the one generating the segfaults? Not with anything later than that?

4.8 has a bug in its C++11 implementation I can't work around.

On May 4, 2017, at 9:37 AM, Ignat Drozdov notifications@github.com wrote:

4.8.4

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

idroz commented 7 years ago

Yup - that's the one. 4.9 and 5 are fine. Haven't tested on 6 and up.

elbamos commented 7 years ago

@NagaComBio Can you confirm that you were compiling with gcc before 4.8 also?

NagaComBio commented 7 years ago

@elbamos No, my default version is gcc (SUSE Linux) 4.8.1 20130909. And I just recompiled largeVis/master with gcc (GCC) 6.2.0 and it worked without the segmentation fault.

elbamos commented 7 years ago

Ok - I think the solution here is to throw an error if the user tries to compile with gcc < 4.9. This is an issue in 4.8's implementation of c++11, in the handling of lambdas I believe, and I really don't want to change that part of the code.

How do you guys feel about it?

On May 4, 2017, at 11:38 AM, Nagarajan Paramasivam notifications@github.com wrote:

@elbamos No, my default version is gcc (SUSE Linux) 4.8.1 20130909. And I just recompiled largeVis/master with gcc (GCC) 6.2.0 and it worked without the segmentation fault.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

idroz commented 7 years ago

@elbamos - sounds like a good way around the issue. Thanks a lot for looking into it.

NagaComBio commented 7 years ago

I agree as well. Thank you @elbamos @idroz.

elbamos commented 7 years ago

Thanks guys!

I have a version up in the develop branch. It should fail to compile on gcc < 4.9 but work fine for you otherwise. If you want to give it a try, I'll close this issue.

NagaComBio commented 7 years ago

Just tested the 'develop' branch with gcc < 4.9 and got the error message and the same version worked fine for the gcc 6.2.0. Cheers.

In file included from checkfunctions.cpp:1:0:
largeVis.h:7:2: error: #error largeVis is incompatible with gcc < 4.9. Upgrade gcc or use llvm.
 #error largeVis is incompatible with gcc < 4.9. Upgrade gcc or use llvm.
  ^
make: *** [checkfunctions.o] Error 1
ERROR: compilation failed for package ‘largeVis’
idroz commented 7 years ago

Works great 👍