KlugerLab / FIt-SNE

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Other
593 stars 108 forks source link

segfault: invalid permissions #30

Open jowkar opened 6 years ago

jowkar commented 6 years ago

After installing the current version, using R 3.5.0 and Fedora 28, the example code in examples/test.R runs fine. However, when attempting to analyze a larger dataset (a matrix with dimensions 29759 x 33650), the following error results:

caught segfault address 0x7fd57089d000, cause 'invalid permissions'

Traceback: 1: writeBin(tX, f) 2: fftRtsne(X) An irrecoverable exception occurred. R is aborting now ...

Any idea what could be the problem? RAM usage is at the point of failure about 59%, so it does not seem to be an out of memory error. Running R as root did not solve the problem either.

dkobak commented 6 years ago

I don't know why this happens with writeBin(), but note that 33650 is way too high dimensionality to be useful and will likely only cause problems for annoy nearest-neighbours search. I'd recommend to do PCA first, and then only pass something like 50 PCs (i.e. 29759 x 50) matrix to FIt-SNE.

linqiaozhi commented 6 years ago

That really is strange. I can replicate the error by trying to write a vector of length 3E4*3E4 to the disk using writeBin. We need to look into this more--but @dkobak is right, you probably don't want to be applying t-SNE to a matrix of such high dimension. You should consider doing PCA (with randomized PCA or Lanczos methods) and then go ahead and run t-SNE.

The ultimate solution here is to avoid writing to disk entirely (e.g. with Rcpp) and passing the matrix and other variables directly to the C++ code. But still, I would not have expected writebin() to give a segfault.

ms609 commented 4 years ago

If it helps diagnose, I've encountered the same problem in a package that I'm writing. I believe that the error arises when trying to calloc an array of a size that requires more than 2^32 bytes of memory, i.e. x * y * sizeof(int) is close to 2^32. (I write 'close to' rather than 'more than' as there's an additional memory overhead of a few bytes associated with the array that seems to be counted towards the 2^32 byte limit.) I see the error on Fedora, but not on Windows, in which calloc fails gracefully and returns NULL; presumably there are differences in memory management between the platforms?