ZhuMSLab / DecoMetDIA

9 stars 4 forks source link

Memory exhaustion and xcms error #2

Open birnbera opened 4 years ago

birnbera commented 4 years ago

Hello, thanks for making this package. It looks very promising and is quite easy to read (a true accomplishment for R software!).

I am just beginning to test it with some of my data and I've run into a couple of issues that I would like some help with.

When running the software on my SWATH data I have encountered an error due to memory exhaustion as well as a warning regarding centWave (see below). Besides increasing the amount of memory available (i.e. running it on a cluster), are there some settings I should adjust to reduce some of the "background", e.g. the signal-to-noise or bandwidth parameters? Or any filters I should apply during conversion to mzXML?

Regarding the warning from centWave, which I believe is used by xcms, is there some other way that I should be converting my wiff files into mzXML besides the default options from msConvert? Or is this safe to ignore? I'm working more on the data analysis side than the data collection side, so I'm not sure if this is an issue with how the data was collected or if the conversion needs some specific options to generate a centroided mzXML file.

> DecoMetDIA::DecoMetDIA(polarity = 'negative', lc = 'HILIC')

--------------------------------------
Detecting and aligning features ...
--------------------------------------

Detect peaks with xcms ...
Detecting mass traces at 25 ppm ... OK
Detecting chromatographic peaks in 267102 regions of interest ... OK: 113964 found.
1/Users/birnbera/Projects/Lipidomics/DIA/wiff_data/./20191209 KO-ER 1.mzXML  [.]  -->  113964  Features. 

 Sample classes ->  . 
Start grouping after retention time.
Created 581 pseudospectra.
Generating peak matrix!
Run isotope peak annotation
 % finished: 10  20  30  40  50  60  70  80  90  100  
Found isotopes: 21147 
Polarity is set in xsAnnotate: negative 
Found and use user-defined ruleset!
Calculating possible adducts in 581 Groups... 
 % finished: 10  20  30  40  50  60  70  80  90  100  
ALL peak detection work are finished!!

--------------------------------------
Start processing SWATH data...
--------------------------------------

Error: vector memory exhausted (limit reached?)
In addition: Warning message:
In .local(object, ...) :
  It looks like this file is in profile mode. centWave can process only centroid mode data !

Also, as an aside, it looks like the package is changing the working directory rather than reading from the specified input directory and using the supplied input directory for all read/write operations. This causes the DecoMetDIA function to fail since it can't find the files after changing directories:

> DecoMetDIA::DecoMetDIA(d.in = 'wiff_data/', polarity = 'negative', lc = 'HILIC')

--------------------------------------
Detecting and aligning features ...
--------------------------------------

Detect peaks with xcms ...
Error in xcmsSet(files, method = "centWave", ppm = ppm.pd, snthr = sn,  : 
  No NetCDF/mzXML/mzData/mzML files were found.

Another aside, it seems that the package is exporting several of the libraries that it uses into the user's namespace, thus overwriting functions, etc. This is fine for command line use, but is a bit annoying when using the package as a library within other software.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin18.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /usr/local/Cellar/openblas/0.3.7/lib/libopenblasp-r0.3.7.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] DecoMetDIA_1.0.0    gtools_3.8.1        splus2R_1.2-2       Cairo_1.5-10        CAMERA_1.42.0      
 [6] xcms_3.8.1          MSnbase_2.12.0      ProtGenerics_1.18.0 S4Vectors_0.24.1    mzR_2.20.0         
[11] BiocParallel_1.20.1 Biobase_2.46.0      BiocGenerics_0.32.0 Rcpp_1.0.3          renv_0.8.3-25      

loaded via a namespace (and not attached):
 [1] vsn_3.54.0             splines_3.6.1          foreach_1.4.7          Formula_1.2-3         
 [5] BiocManager_1.30.10    affy_1.64.0            latticeExtra_0.6-29    RBGL_1.62.1           
 [9] robustbase_0.93-5      impute_1.60.0          pillar_1.4.3           backports_1.1.5       
[13] lattice_0.20-38        limma_3.42.0           digest_0.6.23          RColorBrewer_1.1-2    
[17] checkmate_1.9.4        colorspace_1.4-1       htmltools_0.4.0        preprocessCore_1.48.0 
[21] Matrix_1.2-17          plyr_1.8.5             MALDIquant_1.19.3      XML_3.98-1.20         
[25] pkgconfig_2.0.3        zlibbioc_1.32.0        scales_1.1.0           RANN_2.6.1            
[29] jpeg_0.1-8.1           affyio_1.56.0          htmlTable_1.13.3       tibble_2.1.3          
[33] IRanges_2.20.1         ggplot2_3.2.1          nnet_7.3-12            lazyeval_0.2.2        
[37] MassSpecWavelet_1.52.0 survival_2.44-1.1      magrittr_1.5           crayon_1.3.4          
[41] ncdf4_1.17             doParallel_1.0.15      MASS_7.3-51.4          foreign_0.8-71        
[45] graph_1.64.0           data.table_1.12.8      tools_3.6.1            lifecycle_0.1.0       
[49] stringr_1.4.0          munsell_0.5.0          cluster_2.1.0          pcaMethods_1.78.0     
[53] compiler_3.6.1         mzID_1.24.0            rlang_0.4.2            grid_3.6.1            
[57] iterators_1.0.12       rstudioapi_0.10        htmlwidgets_1.5.1      igraph_1.2.4.2        
[61] base64enc_0.1-3        gtable_0.3.0           codetools_0.2-16       multtest_2.42.0       
[65] R6_2.4.1               gridExtra_2.3          knitr_1.26             Hmisc_4.3-0           
[69] stringi_1.4.3          rpart_4.1-15           acepack_1.4.1          png_0.1-7             
[73] DEoptimR_1.0-8         xfun_0.11             
ZhuMSLab commented 4 years ago

When running the software on my SWATH data I have encountered an error due to memory exhaustion as well as a warning regarding centWave (see below). Besides increasing the amount of memory available (i.e. running it on a cluster), are there some settings I should adjust to reduce some of the "background", e.g. the signal-to-noise or bandwidth parameters? Or any filters I should apply during conversion to mzXML?

I am sorry for the late reply.

  1. How many samples were used in this analysis? Typically, the memory usage for one sample in 1 thread is about 1GB. I am not sure why the memory was 'exhausted' in your data analysis, but I suggest to process the data with limited threads, e.g., if you have 8GB memory size and a 4-Core (with hyper-thread supported) CPU, the number of threads should no more than 4. If the memory size is 16 GB, one can set the threads to 6, maybe.
  2. Of course raising S/N ratio may filter some noisy peaks and increase the processing speed. The 'snthr' parameter can be used to define the S/N ratio in DecoMetDIA.
  3. To convert the raw data to 'mzXML' format, I suggest to use MSConvert program in ProteoWizard software. I am afraid the data converting work should be done under Windows OS. I never tried on Linux/MacOS as the vender data is not supported by ProteoWizard for the non-Windows OS versions.

    Regarding the warning from centWave, which I believe is used by xcms, is there some other way that I should be converting my wiff files into mzXML besides the default options from msConvert? Or is this safe to ignore? I'm working more on the data analysis side than the data collection side, so I'm not sure if this is an issue with how the data was collected or if the conversion needs some specific options to generate a centroided mzXML file.

Yes, the 'centWave' method is used for peak detection and here we only support the converted centroid data other than the raw data (like 'wiff', etc.). To obtain the centroid data, one should add the 'peak picking' parameter when converting the raw data to 'mzXML' with msConvert.

Also, as an aside, it looks like the package is changing the working directory rather than reading from the specified input directory and using the supplied input directory for all read/write operations. This causes the DecoMetDIA function to fail since it can't find the files after changing directories:

Yes, the DecoMetDIA changes working directory when processing the data and change it back to the original one after the data are processed. However, the directory can not be changed back when error occurs, which should be corrected maybe. Some of the loaded packages are required in DecoMetDIA, while others may be induced by the required packages, such as xcms/CAMERA, etc. It's true that the loaded packages may overwriting other methods, which there is still no solutions to prevent this.