compomics / ms2rescore

Modular and user-friendly platform for AI-assisted rescoring of peptide identifications
https://ms2rescore.readthedocs.io
Apache License 2.0
39 stars 14 forks source link

Error parsing scan number from mgf file #82

Closed Hassan-1991 closed 8 months ago

Hassan-1991 commented 1 year ago

Hi, I'm trying to run ms2rescore on MaxQuant output, and here's the error I keep getting:

2022-10-25 08:34:42 // INFO // ms2rescore // Using MaxQuantPipeline.
2022-10-25 08:34:44 // WARNING // ms2rescore.maxquant // Removed 17537 non-rank 1 PSMs.
2022-10-25 08:34:45 // INFO // ms2rescore.parse_mgf // Parsing 2 MGF files to single MGF containing all PSMs.
  0%|                                                                                                                                                                               | 0/2 [00:00<?, ?it/s]
2022-10-25 08:34:45 // ERROR // ms2rescore.__main__ // Critical error occured in MS2ReScore
Traceback (most recent call last):
  File "/ms2rescore/__main__.py", line 15, in main
    rescore.run()
  File "/ms2rescore/__init__.py", line 233, in run
    peprec = self.pipeline.get_peprec()
  File "/ms2rescore/id_file_parser.py", line 402, in get_peprec
    self.parse_mgf_files(peprec)
  File "/ms2rescore/id_file_parser.py", line 390, in parse_mgf_files
    mgf_title_pattern=self.mgf_title_pattern
  File "/ms2rescore/parse_mgf.py", line 109, in parse_mgf
    title = title_parser(line, mgf_title_pattern=mgf_title_pattern, method=title_parsing_method, run=run)
  File "/ms2rescore/parse_mgf.py", line 66, in title_parser
    f"Could not extract scan number from TITLE field: `{line.strip()}`"
ms2rescore.parse_mgf.ParseMGFError: Could not extract scan number from TITLE field: `TITLE=expt1.2.2.3`

I checked my mgf file, and the TITLE lines look like this:

TITLE=expt1.2.2.3 TITLE=expt1.4.4.3 TITLE=expt1.5.5.3 ...

I.e., for each scan, there are three numbers separated by dots after the experiment title. I converted the raw files to mgf with msconvert. I understand parse_mgf.py script expects a different format.

Would be grateful for any help!

ArthurDeclercq commented 1 year ago

Hi @Hassan-1991,

I'm sorry for the late reply. In the latest version we provide a parameter in the config file where you can change the regex that is used to extract the scan number from the title field. you could change ´TITLE=.scan=([0-9]+).$in the config to something like ´TITLE=.*expt1.([0-9]+).*$ assuming that the scan number is the first digit behind expt1. This scan number has to match with the scan numbers in the maxquant msms otherwise this will result in mismatches. I hope this helps.

Thank you for your patience!

Cheers, Arthur