MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
76 stars 36 forks source link

MGF Parsing Error #54

Closed verheytb closed 5 years ago

verheytb commented 5 years ago

Hi,

I am reading an MGF file that begins with:

BEGIN IONS
TITLE=Spectrum_10
PEPMASS=675.248962402344
CHARGE=2+
RTINSECONDS=0.08354222375
SCANS=10
RAWFILE=/raw_files/sample_F01_042.raw
124.67236 956.8577
124.7327 1428.0652
124.7569 1408.8286
124.76196 1120.6285
124.76843 1933.553
124.77164 1746.6324
124.78518 1970.1355
124.78899 2219.085
124.79262 1223.6348
124.79761 1223.9208
124.80852 925.6152
124.82049 1126.6803
124.82349 1660.1154
...

Here is the output from MS-GF+:

MS-GF+ Release (v2017.07.21) (21 July 2017)
Loading database files...
Warning: Sequence database contains 29 counts of letter 'U', which does not correspond to an amino acid.
Warning: Sequence database contains 2 counts of letter 'X', which does not correspond to an amino acid.
Loading database finished (elapsed time: 3.16 sec)
Reading spectra...
java.lang.NullPointerException
        at edu.ucsd.msjava.parser.MgfSpectrumParser.getSpecMetaInfoMap(MgfSpectrumParser.java:182)
        at edu.ucsd.msjava.msutil.SpectraMap.<init>(SpectraMap.java:22)
        at edu.ucsd.msjava.msutil.SpectraAccessor.getSpecMap(SpectraAccessor.java:52)
        at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:217)
        at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:105)
        at edu.ucsd.msjava.ui.MSGFPlus.main(MSGFPlus.java:56)

I can't seem to find anything in the MGF that goes against spec. The MGF was produced with RawTools.

Ted

alchemistmatt commented 5 years ago

I used RawTools to convert one of our datasets to a .mgf file. I'm finding that numerous MS2 spectra have PEPMASS=0 despite the fact that the .raw file has an m/z value for the precursor. Does RawTools use PEPMASS=0 for poor quality spectra? MS-GF+ requires a precursor m/z value for every MS2 spectrum; so it will effectively skip spectra with PEPMASS=0

The NullPointerException is a different matter; it was due to the .mgf file being Unicode and starting with a byte order mark. I will update the code to account for this.

alchemistmatt commented 5 years ago

Until we can get this fixed, you can add a blank line to the start of your .mgf file and it should analyze properly. However, testing this, the spectra with PEPMASS=0 result in new, separate bug.

java.lang.NegativeArraySizeException
        at edu.ucsd.msjava.msscorer.FastScorer.<init>(FastScorer.java:21)
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.preProcessIndividualSpectra(ScoredSpectraMap.java:234)
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.preProcessSpectra(ScoredSpectraMap.java:190)
        at edu.ucsd.msjava.msdbsearch.ScoredSpectraMap.preProcessSpectra(ScoredSpectraMap.java:182)
        at edu.ucsd.msjava.msdbsearch.ConcurrentMSGFPlus$RunMSGFPlus.run(ConcurrentMSGFPlus.java:87)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
verheytb commented 5 years ago

Hi Matt,

I used RawTools 1.2.0 to parse Thermo RAW data from a Fusion Orbitrap machine, but I'm not getting PEPMASS=0 in my MGF files or the NegativeArraySizeException errors you are having. If it helps, I didn't use RawTools' intensity filters when generating MGF.

Thanks for you help in finding and fixing the byte order mark issue. I ended up using the following to fix it in all my files:

sed -i '1s/^\xEF\xBB\xBF//' *.mgf

Ted

alchemistmatt commented 5 years ago

This should be fixed with the latest release; please try processing the original .mgf files using https://github.com/MSGFPlus/msgfplus/releases/tag/v2019.01.22