Open r-cheologist opened 3 years ago
The errors mentioned in the stack trace do not indicate the specific problem. However, you're using 30 threads and 3500M (aka 3.5 GB) of memory. That, coupled with 20 dynamic modifications (variable mods) is just asking for trouble. Start smaller, then work up to more complex searches. Give Java more memory (java -Xmx12000M
minimum), try fewer dynamic mods, use 8 threads. Once you get that working, ramp up the number of dynamic mods and/or number of threads.
Thank you for your insight. I'm exploring and will report back.
I actually have memory to spare on this machine (512 GB, and 32 cores/64 logical processors) - can you share how to recruit that power into MS-GF+ processing rationally? How many spectra, how large of a DB how many dynmods etc. need what amount of resources?
There is no set formula, other than "more memory is better" (presuming you're using 64-bit Java). I personally have never used more than ~20 GB (aka -Xmx20000M) but it should work fine with more. Keep in mind that for each dynamic mod added, the search space grows dramatically, as illustrated at https://en.wikipedia.org/wiki/Combinatorial_explosion. My general rule of thumb is to not exceed 6 dynamic mods in a single search. Yes, having a smaller FASTA file helps (small being less than 5 MB).
Hi, I'm the first author of the aforementioned publication. Below I share important parameters to handle memory issues (discussed with @r-cheologist via email). This is also described in the Online Methods section.
Memory used in paper: -Xmx30G
(sufficient with H. sapiens Swiss-Prot database)
Number of tolerable termini: -ntt 2
Modification: -mod file.txt
(details written below)
NumMods=1
C2H3N1O1,C,fix,any,Carbamidomethyl
C9H12N2O6,ADEFGHIKLMNPQRSTVWY,opt,any,UridineNotCys
C7H9N1O5,C,opt,any,UridineCys
This way, MS-GF+ could handle 20 mods with modest memory usage. I guess the most critical parameter would have been the NumMods
(which was set to 3).
Details I am trying to follow the protocol of Bae, J.W., Kwon, S.C., Na, Y., Kim, V.N., and Kim, J.-S. (2020). Chemical RNA digestion enables robust RNA-binding site mapping at single amino acid resolution. Nature Structural & Molecular Biology 27, 678–682.
This involves a dynamic modification on ALL AAs and I set out to play with just 3 raw files and a very small FASTA DB. What am I doing wrong that may lead to the following error:
Useful extras
java -Xmx3500M -jar MSGFPlus_v20210322\MSGFPlus.jar -s 20210415_JG_HF1_LC1200_mRNA_crosslinking_HFpilot_sample.mzML -d <...>_FASTA.fasta -conf M SGFPlus_Tryp_MetOx_StatCysAlk_UridineBae2020_20p0pmParTol.txt