MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
76 stars 36 forks source link

First time user, MANY Dynmods: MS-GF+ crashes #123

Open r-cheologist opened 3 years ago

r-cheologist commented 3 years ago

Details I am trying to follow the protocol of Bae, J.W., Kwon, S.C., Na, Y., Kim, V.N., and Kim, J.-S. (2020). Chemical RNA digestion enables robust RNA-binding site mapping at single amino acid resolution. Nature Structural & Molecular Biology 27, 678–682.

This involves a dynamic modification on ALL AAs and I set out to play with just 3 raw files and a very small FASTA DB. What am I doing wrong that may lead to the following error:

>java -Xmx3500M -jar MSGFPlus_v20210322\MSGFPlus.jar -s 20210415_JG_HF1_LC1200_mRNA_crosslinking_HFpilot_sample.mzML -d <...>_FASTA.fasta -conf M
SGFPlus_Tryp_MetOx_StatCysAlk_UridineBae2020_20p0pmParTol.txt
MS-GF+ Release (v2021.03.22) (22 March 2021)
Java 1.8.0_292 (Amazon.com Inc.)
Windows Server 2012 R2 (amd64, version 6.3)
Loading database files...
Counting number of distinct peptides in <...>_FASTA.revCat.csarr usin
g <...>_FASTA.revCat.cnlcp
Loading database finished (elapsed time: 0.14 sec)
Reading spectra...
Opening mzML file 20210415_JG_HF1_LC1200_mRNA_crosslinking_HFpilot_sample.mzML
Ignoring 0 profile spectra.
Ignoring 0 spectra having less than 10 peaks.
Reading spectra finished (elapsed time: 1129.91 sec)
Using 30 threads.
Search Parameters:
        PrecursorMassTolerance: 20.0 ppm
        IsotopeError: -1,1
        TargetDecoyAnalysis: true
        FragmentationMethod: As written in the spectrum or CID if no info
        Instrument: HighRes (Orbitrap/FTICR/Lumos)
        Enzyme: Tryp
        Protocol: Standard
        NumTolerableTermini: 2
        IgnoreMetCleavage: false
        MinPepLength: 6
        MaxPepLength: 50
        MinCharge: 2
        MaxCharge: 5
        NumMatchesPerSpec: 1
        MaxMissedCleavages: -1
        MaxNumModsPerPeptide: 3
        ChargeCarrierMass: 1.00727649 (proton)
        MinNumPeaksPerSpectrum: 10
        NumIsoforms: 128
Post translational modifications in use:
        Fixed (static):     Carbamidomethyl on C (+57.0215)
        Variable (dynamic): Oxidation on M (+15.9949)
        Variable (dynamic): UridineMinusCarbamidomethylation on C (+187.0481)
        Variable (dynamic): Uridine on A (+244.0695)
        Variable (dynamic): Uridine on R (+244.0695)
        Variable (dynamic): Uridine on N (+244.0695)
        Variable (dynamic): Uridine on D (+244.0695)
        Variable (dynamic): Uridine on Q (+244.0695)
        Variable (dynamic): Uridine on E (+244.0695)
        Variable (dynamic): Uridine on G (+244.0695)
        Variable (dynamic): Uridine on H (+244.0695)
        Variable (dynamic): Uridine on I (+244.0695)
        Variable (dynamic): Uridine on L (+244.0695)
        Variable (dynamic): Uridine on K (+244.0695)
        Variable (dynamic): Uridine on M (+244.0695)
        Variable (dynamic): Uridine on F (+244.0695)
        Variable (dynamic): Uridine on P (+244.0695)
        Variable (dynamic): Uridine on S (+244.0695)
        Variable (dynamic): Uridine on T (+244.0695)
        Variable (dynamic): Uridine on W (+244.0695)
        Variable (dynamic): Uridine on Y (+244.0695)
        Variable (dynamic): Uridine on V (+244.0695)

Spectrum 0-81269 (total: 81270)
Splitting work into 90 tasks.
Exception in thread "pool-1-thread-1" Exception in thread "pool-1-thread-5" Exce
ption in thread "pool-1-thread-6" Exception in thread "pool-1-thread-8" java.lan
g.NullPointerException
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167)Search progress
:       at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)11
 /      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)20
 tasks,         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPool
Executor.java:624)55.00
%       at java.lang.Thread.run(Thread.java:748)
0.02Exception in thread "pool-1-thread-13"  Exception in thread "pool-1-thread-3
" secondsException in thread "pool-1-thread-12"  elapsedException in thread "poo
l-1-thread-7"
Exception in thread "pool-1-thread-9" Exception in thread "pool-1-thread-11" Exc
eption in thread "pool-1-thread-14" Search progressException in thread "pool-1-t
hread-10" : Exception in thread "pool-1-thread-4" 20Exception in thread "pool-1-
thread-2"  / java.lang.NullPointerException20
 tasks,         at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKe
yMap(ScoredSpectraMap.java:167)100.00
%       at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)
0.03    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)
seconds at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624) elapsed

        at java.lang.Thread.run(Thread.java:748)
Search progressjava.lang.NullPointerException
:       at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167)20
 /      at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)20
 tasks,         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolE
xecutor.java:1149)100.00
%       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)
0.05    at java.lang.Thread.run(Thread.java:748)
secondsjava.lang.NullPointerException
 elapsed        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKe
yMap(ScoredSpectraMap.java:167)

Search progress at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run
(ConcurrentMSGFPlus.java:83):
20      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149) /
20      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624) tasks,
100.00  at java.lang.Thread.run(Thread.java:748)%
                java.util.concurrent.RejectedExecutionException: Task edu.ucsd.m
sjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus@b5325b2 rejected from edu.ucsd.m
sjava.misc.ThreadPoolExecutorWithExceptions@6721be82[Shutting down, pool size =
17, active threads = 16, queued tasks = 0, completed tasks = 4]0.06
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution
(ThreadPoolExecutor.java:2063)seconds
 elapsed        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExec
utor.java:830)

Search progress at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExe
cutor.java:1379):
20      at edu.ucsd.msjava.misc.ThreadPoolExecutorWithExceptions.execute(ThreadP
oolExecutorWithExceptions.java:67) /
20      at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:395) tasks,
100.00  at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:113)%
                        at edu.ucsd.msjava.ui.MSGFPlus.main(MSGFPlus.java:61)0.0
8
 java.lang.NullPointerException
seconds at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167) elapsed

        at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)Search progress
:       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)20
 /      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)20
 tasks,         at java.lang.Thread.run(Thread.java:748)100.00
%java.lang.NullPointerException
0.09    at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167)
seconds at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83) elapsed

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)
        at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167)
        at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)
        at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167)
        at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)
        at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167)
        at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)
        at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167)
        at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)
        at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167)
        at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)
        at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167)
        at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)
        at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167)
        at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)
        at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.makePepMassSpecKeyMap(Sco
redSpectraMap.java:167)
        at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(Concurr
entMSGFPlus.java:83)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:624)
        at java.lang.Thread.run(Thread.java:748)
May 31, 2021 4:28:23 PM edu.ucsd.msjava.ui.MSGFPlus runMSGFPlus
SEVERE: null
java.util.concurrent.RejectedExecutionException: Task edu.ucsd.msjava.msdbsearch
.ConcurrentMSGFPlus$RunMSGFPlus@b5325b2 rejected from edu.ucsd.msjava.misc.Threa
dPoolExecutorWithExceptions@6721be82[Shutting down, pool size = 17, active threa
ds = 16, queued tasks = 0, completed tasks = 4]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution
(ThreadPoolExecutor.java:2063)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.jav
a:830)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.ja
va:1379)
        at edu.ucsd.msjava.misc.ThreadPoolExecutorWithExceptions.execute(ThreadP
oolExecutorWithExceptions.java:67)
        at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:395)
        at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:113)
        at edu.ucsd.msjava.ui.MSGFPlus.main(MSGFPlus.java:61)

[Error] Task terminated; results incomplete. Please run again.

Useful extras

alchemistmatt commented 3 years ago

The errors mentioned in the stack trace do not indicate the specific problem. However, you're using 30 threads and 3500M (aka 3.5 GB) of memory. That, coupled with 20 dynamic modifications (variable mods) is just asking for trouble. Start smaller, then work up to more complex searches. Give Java more memory (java -Xmx12000M minimum), try fewer dynamic mods, use 8 threads. Once you get that working, ramp up the number of dynamic mods and/or number of threads.

r-cheologist commented 3 years ago

Thank you for your insight. I'm exploring and will report back.

r-cheologist commented 3 years ago

I actually have memory to spare on this machine (512 GB, and 32 cores/64 logical processors) - can you share how to recruit that power into MS-GF+ processing rationally? How many spectra, how large of a DB how many dynmods etc. need what amount of resources?

alchemistmatt commented 3 years ago

There is no set formula, other than "more memory is better" (presuming you're using 64-bit Java). I personally have never used more than ~20 GB (aka -Xmx20000M) but it should work fine with more. Keep in mind that for each dynamic mod added, the search space grows dramatically, as illustrated at https://en.wikipedia.org/wiki/Combinatorial_explosion. My general rule of thumb is to not exceed 6 dynamic mods in a single search. Yes, having a smaller FASTA file helps (small being less than 5 MB).

jwbaebio commented 3 years ago

Hi, I'm the first author of the aforementioned publication. Below I share important parameters to handle memory issues (discussed with @r-cheologist via email). This is also described in the Online Methods section.

  1. Memory used in paper: -Xmx30G (sufficient with H. sapiens Swiss-Prot database)

  2. Number of tolerable termini: -ntt 2

  3. Modification: -mod file.txt (details written below)

    NumMods=1
    C2H3N1O1,C,fix,any,Carbamidomethyl
    C9H12N2O6,ADEFGHIKLMNPQRSTVWY,opt,any,UridineNotCys
    C7H9N1O5,C,opt,any,UridineCys

This way, MS-GF+ could handle 20 mods with modest memory usage. I guess the most critical parameter would have been the NumMods (which was set to 3).