cstoeckert / iterativeWGCNA

Extension of the WGCNA program to improve the eigengene similarity of modules and increase the overall number of genes in modules.
GNU General Public License v2.0
59 stars 17 forks source link

Segmentation fault error #21

Closed johnmb25 closed 6 years ago

johnmb25 commented 6 years ago

I'm attempting to run iterativeWGCNA on an expression dataset of ~11,000 genes and ~100 samples. However, when I run iterativeWGCNA from the command line (using the run_iterative_wgcna.py script with required input file argument and a couple WGCNA parameters), the program fails after a few seconds and prints the message "Segmentation fault: 11". From some googling it sounds like this indicates a memory error, but I'm not sure why this is happening. I've also tried running iterativeWGCNA with a smaller maxBlockSize parameter and subsampling my dataset down to 2000 genes, and with both of these setups I still get the same error. Any help would be greatly appreciated.

fossilfriend commented 6 years ago

iterativeWGCNA outputs two log files: iterativeWGCNA.log and iterativeWGCNA-R.log. Do either contain any additional information?

In general the number of samples should not be an issue. The memory issues are (typically) a limitation of WGCNA, or more accurately, R which stores matrices in contiguous blocks. So the limiting factor is the size of the correlation matrix (or the number of genes). The number of samples involved in the calc of correlation should not be a factor. But ... to be honest I've only tested iterativeWGCNA out on samples sizes ~30 w/~16-17,000 expressed genes (run in a single block). I've done this successfully on a machine w/16GB of memory.

That being said, I'm going to run a test-drive with a larger sample size as it is possible that something on the python size is balking at the larger dataset.

johnmb25 commented 6 years ago

The iterativeWGCNA-R.log file is completely empty.

The iterativeWGCNA.log file contains (what I assume to be) the expected information (WGCNA parameter info, input/output directories, etc.). The iterativeWGCNA.log file does not contain any error messages.

fossilfriend commented 6 years ago

On my end I ran a test-case on my server with 120 samples and ~16,000 expressed genes with no issues (or slow down compared to N=30 samples); again I expect memory problems to arise as the #of genes increases, not the # of samples.

Thanks for looking into the log files. Does the iterativeWGCNA.log have a line like:

Loaded file: my_file.txt 120 Samples 15937 Genes

If not (which I am assuming), and since the R-log is empty, I would guess that iterativeWGCNA is crashing when it makes a first attempt to load the data (first calls to rpy2). If your data file had been parsed and processing initiated, the logs would have contained something.

A segmentation fault is usually a rather ambiguous C error that more often than not has to do with some issue with memory, but not necessarily running out of memory. The problem is my code is Python -- not C. So I would guess that your issue lies with some third-party library that is referenced by either R or rpy2.

Any chance you are using anaconda? Other folks working with that platform have already run into issues linking C libraries. Or are you on a Mac? Ryp2 has some known issues compiling/running on Macs (we touched on one in the troubleshooting section of the readme). Others folks have reported segmentation faults (with Anaconda on Mac OSX):

https://bitbucket.org/rpy2/rpy2/issues/214/segmentation-fault-11-on-mac-os-x-mountain