compomics / ms2rescore

Modular and user-friendly platform for AI-assisted rescoring of peptide identifications
https://ms2rescore.readthedocs.io
Apache License 2.0
39 stars 14 forks source link

Not able to run ms2resore through command line for this repo examples #89

Closed rolivella closed 8 months ago

rolivella commented 1 year ago

Hello

I'm trying to run ms2rescore through the command line with the example dataset available at the ms2rescore repo:

MGF file: https://github.com/compomics/ms2rescore/blob/master/examples/mgf/20161213_NGHF_DBJ_SA_Exp3A_HeLa_1ug_7min_15000_02.mgf.zip

MZID file: https://github.com/compomics/ms2rescore/blob/master/examples/id/msgfplus.mzid

And this config file: config.zip

If run it on Ubuntu 22.04 through this command: ms2rescore -c config.json -m 20161213_NGHF_DBJ_SA_Exp3A_HeLa_1ug_7min_15000_02.mgf msgfplus.mzid

I get this error. Am I missing some file? Thanks.

2023-07-04 14:37:49 // INFO // ms2rescore // Using MSGFPipeline.
2023-07-04 14:37:49 // INFO // ms2rescore.percolator // Running Percolator PIN converter

Pin-converter version 3.06.1, Build Date Jun 15 2023 14:57:04
Copyright (c) 2013 Lukas Käll. All rights reserved.
Written by Lukas Käll (lukas.kall@scilifelab.se) in the
School of Biotechnology at KTH - Royal Institute of Technology, Stockholm.
Issued command:
msgf2pin -P XXX -o /tmp/tmp3sn1zv2g/20161213_NGHF_DBJ_SA_Exp3A_HeLa_1ug_7min_15000_02_original.pin /home/proteomics/mysoftware/compomics/mgf/20161213_NGHF_DBJ_SA_Exp3A_HeLa_1ug_7min_15000_02.mzid

Uses features for fragment spectra mass errors
Reading /home/proteomics/mysoftware/compomics/mgf/20161213_NGHF_DBJ_SA_Exp3A_HeLa_1ug_7min_15000_02.mzid
No scan number was found for a PSM (or it equaled 0), scans are ranked from 1 and up
2023-07-04 14:38:01 // ERROR // ms2rescore.__main__ // Critical error occured in MS2ReScore
Traceback (most recent call last):
  File "/home/proteomics/.local/lib/python3.10/site-packages/ms2rescore/__main__.py", line 15, in main
    rescore.run()
  File "/home/proteomics/.local/lib/python3.10/site-packages/ms2rescore/__init__.py", line 233, in run
    peprec = self.pipeline.get_peprec()
  File "/home/proteomics/.local/lib/python3.10/site-packages/ms2rescore/id_file_parser.py", line 245, in get_peprec
    return self.peprec_from_pin()
  File "/home/proteomics/.local/lib/python3.10/site-packages/ms2rescore/id_file_parser.py", line 191, in peprec_from_pin
    raise IDFileParserError(
ms2rescore.id_file_parser.IDFileParserError: Could not map all MGF retention times to spectrum indices.
rolivella commented 9 months ago

Hello again, I've just reviewed my own issue and I'm a little closer to the solution. I understand from the documentation:

https://ms2rescore.readthedocs.io/en/latest/userguide/configuration/#mapping-psms-to-spectra

That I should add something like this to the JSON config file:

"spectrum_id_pattern": ".*scan=(\\d+)$",
"psm_id_pattern": ".*\\..*\\.(.*)"

However, in my mzid file there's no any spectrum_id field, so how can I parse the PSM with the psm_id_pattern?

Thanks!

rolivella commented 9 months ago

I modified the JSON config file with the correct regex and does not work:

Or:

    "spectrum_id_pattern": ".*scan=(\\d+)$",
    "psm_id_pattern": "mzspec.*scan=(\\d+)$"
RalfG commented 8 months ago

Hi @widmersimon,

Could you retry with MS²Rescore the latest beta version of MS²Rescore 3.0? You can install it with:

pip install ms2rescore --pre

Matching IDs from the PSM file to the spectrum files is always a bit tricky, as their is no single solution. We tried to streamline the process with v3.0.

Let us know if the issue persists and we'll gladly help you out!

rolivella commented 8 months ago

@RalfG thanks! I'll try it and let you know!

rolivella commented 8 months ago

@RalfG works like a charm! Thank you very much

RalfG commented 8 months ago

That's great to hear! Thank you so much for the update!