compomics / ms2rescore

Modular and user-friendly platform for AI-assisted rescoring of peptide identifications
https://ms2rescore.readthedocs.io
Apache License 2.0
49 stars 15 forks source link

Allow rescoring of multiple hits per spectrum #83

Closed JB91451 closed 2 months ago

JB91451 commented 1 year ago

Dear all,

Thank you for creating this nice tool.

I have recently tried to re-score some comet results (pin files). However, during the searches I usually set "num_output_lines" to a value grater than 1 to export also lower-than-best scoring results. Often this improves score adjustment by the TPP/Prophets pipeline. Unfortunately it seems that these lower-hit ranks are also written to the percolator files and result in the error below. The same might be true for other search engines that can output such hits. The error traces back to the _get_spectrum_index_column method in the percolator.py file where the pattern string discards the spectrum identifier information on charge and hit-rank (e.g. there are scans like ..._623_2_1; ..._623_2_2; ... which all become spec id 623). Would it be possible to discard these lower-ranking hits automatically and just throw a warning instead? I guess this would be the cleanest solution as I am not sure if percolator can handle the information properly.

Best, Juergen

The error is: Traceback (most recent call last): File "C:\Programs\Python310\lib\site-packages\ms2rescore__main.py", line 15, in main rescore.run() File "C:\Programs\Python310\lib\site-packages\ms2rescore\init__.py", line 233, in run peprec = self.pipeline.get_peprec() File "C:\Programs\Python310\lib\site-packages\ms2rescore\id_file_parser.py", line 224, in get_peprec return self.peprec_from_pin() File "C:\Programs\Python310\lib\site-packages\ms2rescore\id_file_parser.py", line 179, in peprec_from_pin peprec = self.original_pin.to_peptide_record( File "C:\Programs\Python310\lib\site-packages\ms2rescore\percolator.py", line 470, in to_peptide_record peprec_df["spec_id"] = self._get_spectrum_index_column( File "C:\Programs\Python310\lib\site-packages\ms2rescore\percolator.py", line 270, in _get_spectrum_index_column raise PercolatorInError("Issue in matching spectrum IDs, duplicates found.") ms2rescore.percolator.PercolatorInError: Issue in matching spectrum IDs, duplicates found.

ArthurDeclercq commented 1 year ago

Hi @JB91451,

Thank you for using MS²Rescore! We are aware of the issues with multiple rank rescoring (the non-possibility of doing so). We are currently working on a major refactoring of MS²Rescore where these issues will be addressed. So you will be able to provide provide lower rank psm as well without getting an error!

Thank you for your patience!

RalfG commented 2 months ago

Control of multi-rank PSM rescoring is now fully implemented in v3.1.0:

https://ms2rescore.readthedocs.io/en/v3.1.0/userguide/configuration/#multi-rank-rescoring