compomics / ms2rescore

Modular and user-friendly platform for AI-assisted rescoring of peptide identifications
https://ms2rescore.readthedocs.io
Apache License 2.0
49 stars 15 forks source link

IDFileParserError: Could not map all MGF retention times to spectrum indices (using MSGF+ pipeline) #79

Closed aarmso closed 11 months ago

aarmso commented 2 years ago

I am running MSGF+ using command:

java -Xmx32g -jar MSGFPlus.jar -s Yeast_trypsin.mgf -d database.fasta -tda 1 -ti 0,2 -minLength 7 -maxLength 50 -mod Mods.txt -o Yeast_trypsin.mzid -addFeatures 1

Then I run ms2rescore with command:

ms2rescore -c config.json -m Yeast_trypsin.mgf Yeast_trypsin.mzid

The config file is default except specifying msgfplus as the pipeline.

This gets me the following error:

2022-09-22 16:25:32 // INFO // ms2rescore // Using MSGFPipeline. 2022-09-22 16:25:32 // INFO // ms2rescore.percolator // Running Percolator PIN converter

Pin-converter version 3.05.0, Build Date Aug 31 2020 19:06:15 Copyright (c) 2013 Lukas Käll. All rights reserved. Written by Lukas Käll (lukas.kall@scilifelab.se) in the School of Biotechnology at KTH - Royal Institute of Technology, Stockholm. Issued command: msgf2pin -P XXX -o /tmp/tmpj3fvb69r/Yeast_trypsin_original.pin Yeast_trypsin.mzid

Uses features for fragment spectra mass errors Reading Yeast_trypsin.mzid 2022-09-22 16:25:47 // ERROR // ms2rescore.main // Critical error occured in MS2ReScore Traceback (most recent call last): File "/home/aawa/.local/lib/python3.8/site-packages/ms2rescore/main.py", line 15, in main rescore.run() File "/home/aawa/.local/lib/python3.8/site-packages/ms2rescore/init.py", line 233, in run peprec = self.pipeline.get_peprec() File "/home/aawa/.local/lib/python3.8/site-packages/ms2rescore/id_file_parser.py", line 245, in get_peprec return self.peprec_from_pin() File "/home/aawa/.local/lib/python3.8/site-packages/ms2rescore/id_file_parser.py", line 191, in peprec_from_pin raise IDFileParserError( ms2rescore.id_file_parser.IDFileParserError: Could not map all MGF retention times to spectrum indices.

Any idea what could be causing this? I have tried multiple mgf files from different studies all with MSGF+ and all give the same error.

Thanks for any help.

mvdenbog commented 2 years ago

It turns out that downgrading to ms2rescore==2.0.0 is a usable work-around for this issue.

A worked-out example here below, executing both ms2rescore v2.1.3 and v2.0.0:

Taking the example files for dataset Velos005137.mgf as given at https://github.com/compomics/ms2rescore/tree/master/examples/

Get Velos005137 mgf file:

$ wget https://github.com/compomics/ms2rescore/raw/master/examples/mgf/Velos005137.mgf.zip
$ unzip Velos005137.mgf.zip

Get Pyrococcus furiosus proteome:

$ wget -O Pyrococcus_furiosus_UP000001013.fasta "https://rest.uniprot.org/uniprotkb/stream?format=fasta&query=%28%28proteome%3AUP000001013%29%29"

Get MSGFPlus modifications file:

$ wget https://raw.githubusercontent.com/compomics/ms2rescore/master/examples/parameters/msgfplus_modifications.txt

Run MSGFPlus on mgf and fasta file:

$ java -jar /opt/MSGFPlus/MSGFPlus.jar -mod msgfplus_modifications.txt -s Velos005137.mgf -d Pyrococcus_furiosus_UP000001013.fasta -o Velos005137.mzid -t 10ppm -tda 1 -m 3 -inst 1 -minLength 8 -minCharge 2 -maxCharge 4 -n 1 -e 1 -addFeatures 1 -protocol 0 -thread 10

Get config file for ms2restore:

$ wget https://raw.githubusercontent.com/compomics/ms2rescore/pub/config.json

Run ms2rescore v2.1.3 (FAILS):

$ms2rescore --version
2.1.3
$ percolator -h 2>&1 | grep -i version
Percolator version 3.06.0, Build Date Sep 23 2022 15:10:40

$  ms2rescore -t velos_out -c config.json  -m Velos005137.mgf -o velos_out Velos005137.mzid
2022-09-28 19:07:22 // INFO // ms2rescore // Using MSGFPipeline.
2022-09-28 19:07:22 // INFO // ms2rescore.percolator // Running Percolator PIN converter

Pin-converter version 3.06.0, Build Date Sep 23 2022 15:11:07
Copyright (c) 2013 Lukas Käll. All rights reserved.
Written by Lukas Käll (lukas.kall@scilifelab.se) in the
School of Biotechnology at KTH - Royal Institute of Technology, Stockholm.
Issued command:
msgf2pin -P XXX -o /scratch/velos_out/Velos005137_original.pin /scratch/Velos005137.mzid
Uses features for fragment spectra mass errors
Reading /scratch/Velos005137.mzid
2022-09-28 19:07:42 // ERROR // ms2rescore.__main__ // Critical error occured in MS2ReScore
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/ms2rescore/__main__.py", line 15, in main
    rescore.run()
  File "/usr/local/lib/python3.9/site-packages/ms2rescore/__init__.py", line 233, in run
    peprec = self.pipeline.get_peprec()
  File "/usr/local/lib/python3.9/site-packages/ms2rescore/id_file_parser.py", line 245, in get_peprec
    return self.peprec_from_pin()
  File "/usr/local/lib/python3.9/site-packages/ms2rescore/id_file_parser.py", line 191, in peprec_from_pin
    raise IDFileParserError(
ms2rescore.id_file_parser.IDFileParserError: Could not map all MGF retention times to spectrum indices.

Run ms2rescore v2.0.0 :

$  ms2rescore --version
2.0.0
$ percolator -h 2>&1 | grep -i version
Percolator version 3.02.1, Build Date Sep 28 2022 14:40:07

$ ms2rescore -t velos_out -c config.json   -m Velos005137.mgf -o velos_out Velos005137.mzid
2022-09-28 19:09:03 // INFO // ms2rescore // Using MSGFPipeline.
2022-09-28 19:09:03 // INFO // ms2rescore.percolator // Running Percolator PIN converter

Pin-converter version 3.02.1, Build Date Sep 28 2022 14:40:18
Copyright (c) 2013 Lukas Käll. All rights reserved.
Written by Lukas Käll (lukas.kall@scilifelab.se) in the
School of Biotechnology at KTH - Royal Institute of Technology, Stockholm.
Issued command:
msgf2pin -P XXX -o /scratch/velos_out/Velos005137_original.pin /scratch/Velos005137.mzid
Uses features for fragment spectra mass errors
Reading /scratch/Velos005137.mzid
2022-09-28 19:09:23 // INFO // ms2rescore // Adding MS2 peak intensity features with MS²PIP.
2022-09-28 19:09:23 // INFO // ms2pip // using HCD models
2022-09-28 19:09:25 // INFO // ms2pip // scanning spectrum file...
(21)500 (8)500 (20)500 (4)500 (5)500 (23)500 (14)500 (0)500 (17)500 (10)500 (6)500 (12)500 (11)500 (18)500 (7)500 (16)500 (2)500 (9)500 (1)500 (13)500 (15)500 (22)500 (19)500 (3)500 
2022-09-28 19:11:56 // INFO // ms2pip // writing file velos_out/Velos005137_HCD_pred_and_emp.csv...
2022-09-28 19:11:57 // INFO // ms2rescore // Calculating features from predicted spectra

2022-09-28 19:12:08 // INFO // ms2rescore // Adding retention time features with DeepLC.
2022-09-28 19:12:13 // INFO // deeplc.deeplc // Going to predict retention times for this amount of identifiers: 1509
2022-09-28 19:12:15 // INFO // deeplc.deeplc // Going to predict retention times for this amount of identifiers: 1509
1/1 [==============================] - 1s 809ms/step
2022-09-28 19:12:17 // INFO // deeplc.deeplc // Going to predict retention times for this amount of identifiers: 1509
2022-09-28 19:12:20 // INFO // deeplc.deeplc // Going to predict retention times for this amount of identifiers: 1509
1/1 [==============================] - 1s 648ms/step
2022-09-28 19:12:22 // INFO // deeplc.deeplc // Going to predict retention times for this amount of identifiers: 1509
2022-09-28 19:12:25 // INFO // deeplc.deeplc // Going to predict retention times for this amount of identifiers: 1509
1/1 [==============================] - 1s 697ms/step
2022-09-28 19:12:27 // INFO // deeplc.deeplc // Going to predict retention times for this amount of identifiers: 1509
2022-09-28 19:12:30 // INFO // deeplc.deeplc // Going to predict retention times for this amount of identifiers: 1509
1/1 [==============================] - 1s 702ms/step
2022-09-28 19:12:32 // INFO // deeplc.deeplc // Going to predict retention times for this amount of identifiers: 7668
2/2 [==============================] - 3s 744ms/step
/usr/local/lib/python3.8/site-packages/pygam/utils.py:113: UserWarning: Expected 2D input data array, but found 1D. Expanding to 2D.
  warnings.warn(msg)
2/2 [==============================] - 2s 462ms/step
2/2 [==============================] - 2s 564ms/step
2022-09-28 19:12:43 // INFO // ms2rescore // Generating PIN files
  complete_df, feature_set[cols_to_use], on=on_cols
  complete_df, feature_set[cols_to_use], on=on_cols
2022-09-28 19:12:45 // INFO // ms2rescore // Running Percolator: percolator velos_out_searchengine_ms2pip_rt_features.pin -m velos_out_searchengine_ms2pip_rt_features.pout -M velos_out_searchengine_ms2pip_rt_features.pout_dec -w velos_out_searchengine_ms2pip_rt_features.weights -v 0 -U --post-processing-tdc

2022-09-28 19:14:34 // INFO // ms2rescore // Generating Rescore plots
2022-09-28 19:14:34 // INFO // ms2rescore // MS²ReScore finished!

Ms2Rescore v2.0.0 using MSGF+ pipeline runs to the end. (both versions run well, using the MaxQuant pipeline)

HTH

RalfG commented 11 months ago

Hi @aarmso, @ericmalekos, Happy to say that this should now be fixed in v3.0 of MS²Rescore.