compomics / ms2rescore

Modular and user-friendly platform for AI-assisted rescoring of peptide identifications
https://ms2rescore.readthedocs.io
Apache License 2.0
39 stars 14 forks source link

ms2rescore error: ms2rescore/tandem_to_rescore.py", line 71, in make_pepfile ; KeyError: '57.02147' #11

Closed nattzy94 closed 3 years ago

nattzy94 commented 4 years ago

I am trying to analyze my mass spec data for the presence of small proteins. I started by searching my mgf files against Uniprot annotated and reviewed proteins. I then take the unmatched spectra and search it against a small protein database. This reduces the chances of false positives arising.

I did the search using X!Tandem on R which outputs a xml file which I then use as input into ms2rescore using the following command:

ms2rescore -m /gpfs/eplab/Nathaniel/ms-analysis/unmatched_spectra_from_uniprot_search/b014p014_P_56h_M5_rep1_non_validated_PSMs.mgf -c /gpfs/eplab/Nathaniel/ms2rescore-master/config_tandem.json /gpfs/eplab/Nathaniel/b014p014_P_56h_M5_rep1_non_validated_PSMs.mgf_xtandem.xml

However, the following error message comes up:

Pin-converter version 3.04.0, Build Date Mar 11 2020 14:07:14
Copyright (c) 2013 Lukas Käll. All rights reserved.
Written by Lukas Käll (lukas.kall@scilifelab.se) in the
School of Biotechnology at KTH - Royal Institute of Technology, Stockholm.
Issued command:
tandem2pin -P DECOY /gpfs/eplab/Nathaniel/b014p014_P_56h_M5_rep1_non_validated_PSMs.mgf_xtandem.xml
on compute-9-18
Reading /gpfs/eplab/Nathaniel/b014p014_P_56h_M5_rep1_non_validated_PSMs.mgf_xtandem.xml
2020-07-08 18:11:24 - INFO - Fixing tabs on pin file
2020-07-08 18:11:24 - INFO - Adding mgf TITLE to pin file
No match found in mzid file for SpecId b014p014_P_56h_M5_rep1_non_validated_PSMs_4460_3_1
No match found in mzid file for SpecId b014p014_P_56h_M5_rep1_non_validated_PSMs_5373_3_1
No match found in mzid file for SpecId b014p014_P_56h_M5_rep1_non_validated_PSMs_8981_3_1
No match found in mzid file for SpecId b014p014_P_56h_M5_rep1_non_validated_PSMs_10890_3_1
No match found in mzid file for SpecId b014p014_P_56h_M5_rep1_non_validated_PSMs_12455_3_1
No match found in mzid file for SpecId b014p014_P_56h_M5_rep1_non_validated_PSMs_13836_3_1
No match found in mzid file for SpecId b014p014_P_56h_M5_rep1_non_validated_PSMs_20088_3_1
No match found in mzid file for SpecId b014p014_P_56h_M5_rep1_non_validated_PSMs_24356_3_1
2020-07-08 18:11:24 - INFO - Writing PEPREC file
Traceback (most recent call last):
  File "/home/e0470749/miniconda2/envs/ms2rescore/bin/ms2rescore", line 8, in <module>
    sys.exit(main())
  File "/home/e0470749/miniconda2/envs/ms2rescore/lib/python3.6/site-packages/ms2rescore/__init__.py", line 124, in main
    peprec_filename, mgf_filename = tandem_to_rescore.tandem_pipeline(config)
  File "/home/e0470749/miniconda2/envs/ms2rescore/lib/python3.6/site-packages/ms2rescore/tandem_to_rescore.py", line 152, in tandem_pipeline
    make_pepfile(outname + "_edited.pin", config)
  File "/home/e0470749/miniconda2/envs/ms2rescore/lib/python3.6/site-packages/ms2rescore/tandem_to_rescore.py", line 71, in make_pepfile
    mod.append("{}|{}".format(m.start(), modifications[str(float(m.group(1)))]))
KeyError: '57.02147'

In addition, the following files are created after running ms2rescore:

b014p014_P_56h_M5_rep1_non_validated_PSMs.mgf_xtandem_original.pin
b014p014_P_56h_M5_rep1_non_validated_PSMs.mgf_xtandem_edited.pin
b014p014_P_56h_M5_rep1_non_validated_PSMs.mgf_xtandem_edited.peprecpnomod

Would be great if someone could point out where I've gone wrong.

RalfG commented 4 years ago

Hi! As I mentioned in https://github.com/compomics/ms2pip_c/issues/89, MS²ReScore is the way to go for your use case of MS²PIP!


One remark, though. You mentioned:

I then take the unmatched spectra and search it against a small protein database.

I would definitely not do this, as by (1) not including previously identified spectra in the second search and (2) searching against a smaller database, which does not match the expected search space of the sample, the statistics of your target-decoy search and subsequent false discovery rate (FDR) estimation are not going to add up. This will either lead more false negative identifications or to more false positive identifications than estimated.

I do recommend you to follow the Search-All, Asses Subset approach, where you would search all spectra against a concatenated database of UniProt AND the small proteins, and then estimate the FDR on the subset (small proteins) only. This will lead to an ideally estimated FDR for the small proteins you are searching for.


About MS²ReScore and the error you encountered: It seems like the tool cannot find a modification (57.02147) in it's configuration. Do you have a line with carbamidomethyl in your config file (ms2pip > modifications section)?

      {"name":"Carbamidomethyl", "unimod_accession":4, "mass_shift":57.02200, "amino_acid":"C", "n_term":false},

If you do, and still get the error, I would change the mass shift to match the value listed in the error:

      {"name":"Carbamidomethyl", "unimod_accession":4, "mass_shift":57.02147, "amino_acid":"C", "n_term":false},

I do expect to release a significant update to MS²ReScore by the end of this week, which will include added features from DeepLC retention time prediction. So it might be worth to wait a few more days with your analysis until you can use the latest version of MS²ReScore.

nattzy94 commented 4 years ago

Hi Ralf,

Thanks for the reply. The reason why I did a sub-sub approach was because the MS experiment did not include steps for small protein enrichment. So, in order to avoid likely false positives from an all-all or all-sub approach, I filtered against Uniprot proteins that I expect would form the majority.

Nevertheless thanks for bringing up alternative methods of protein searching and assessment. I never thought to look into such methods since MS is not my main field of study. It will definitely be interesting to implement these alternatives as my predictions are of pretty low quality right now.

On the main error issue, I followed your advice and changed my mass shift to match that of the error. However, I run this error message:

Traceback (most recent call last):
  File "/home/e0470749/miniconda2/envs/new_ms2rescore/bin/ms2rescore", line 8, in <module>
    sys.exit(main())
  File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/ms2rescore/__init__.py", line 124, in main
    peprec_filename, mgf_filename = tandem_to_rescore.tandem_pipeline(config)
  File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/ms2rescore/tandem_to_rescore.py", line 152, in tandem_pipeline
    make_pepfile(outname + "_edited.pin", config)
  File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/ms2rescore/tandem_to_rescore.py", line 83, in make_pepfile
    write_PEPREC(pepfile, path_to_pin)
  File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/ms2rescore/tandem_to_rescore.py", line 96, in write_PEPREC
    pepfile_tosave = pepfile.loc[:, ['TITLE', 'modifications', 'peptide', 'Charge']]
  File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1762, in __getitem__
    return self._getitem_tuple(key)
  File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1289, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1954, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1595, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
  File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1552, in _get_listlike_indexer
    self._validate_read_indexer(
  File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1654, in _validate_read_indexer
    raise KeyError(
KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

Do I have to install an older version of pandas for this to work? In the meantime, I will wait for the new ms2rescore update since it will be available very soon.

Thank you!

P.S. Sorry for all the edits, I was trying various things to troubleshoot.

RalfG commented 4 years ago

Can you the version in https://github.com/compomics/ms2rescore/pull/12#issue-450077964? It's still in development, but a lot has changed since the version you are running. You can download the zip file linked in the post, unpack it, and install the python package (wheel) directly with pip install ms2rescore-0.3.0.dev1-py3-none-any.whl.

For the configuration file, you can use something like this:

{
  "general":{
    "pipeline":"tandem",
    "feature_sets":["all"],
    "run_percolator":true,
    "keep_tmp_files":false,
    "show_progress_bar":true,
    "num_cpu":-1
  },
  "ms2pip": {
    "model": "HCD",
    "frag_error": 0.02,
    "modifications": [
      {"name":"Glu->pyro-Glu", "unimod_accession":27, "mass_shift":-18.0153, "amino_acid":"E", "n_term":true},
      {"name":"Gln->pyro-Glu", "unimod_accession":28, "mass_shift":-17.0305, "amino_acid":"Q", "n_term":true},
      {"name":"Acetyl", "unimod_accession":1, "mass_shift":42.0367, "amino_acid":null, "n_term":true},
      {"name":"Oxidation", "unimod_accession":35, "mass_shift":15.9994, "amino_acid":"M", "n_term":false},
      {"name":"Carbamidomethyl", "unimod_accession":4, "mass_shift":57.0513, "amino_acid":"C", "n_term":false},
      {"name":"Pyro-carbamidomethyl", "unimod_accession":26, "mass_shift":39.994915, "amino_acid":"C", "n_term":false},
      {"name":"Deamidated", "unimod_accession":7, "mass_shift":0.984016, "amino_acid":"N", "n_term":false}
    ]
  }
}
nattzy94 commented 4 years ago

Thanks for the updated version. I tried it using ms2rescore -m /gpfs/eplab/Nathaniel/56h_non_validated_PSMs_concatenated.mgf -c /gpfs/eplab/Nathaniel/config_tandem.json /gpfs/eplab/Nathaniel/56h_non_validated_PSMs_concatenated.mgf_xtandem.xml

The config file was copied from the above comment. However, this gives the error message:

2020-07-17 16:05:54 - INFO - Using tandem pipeline

Pin-converter version 3.05.0, Build Date Jul  9 2020 20:45:05
Copyright (c) 2013 Lukas Käll. All rights reserved.
Written by Lukas Käll (lukas.kall@scilifelab.se) in the
School of Biotechnology at KTH - Royal Institute of Technology, Stockholm.
Issued command:
tandem2pin -P REVERSED /gpfs/eplab/Nathaniel/56h_non_validated_PSMs_concatenated.mgf_xtandem.xml
on compute-9-10
Reading /gpfs/eplab/Nathaniel/56h_non_validated_PSMs_concatenated.mgf_xtandem.xml
Traceback (most recent call last):
  File "/home/e0470749/miniconda2/envs/ms2rescore_2/bin/ms2rescore", line 8, in <module>
    sys.exit(main())
  File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/__main__.py", line 8, in main
    run()
  File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/runner.py", line 57, in run
    peprec_filename, mgf_filename = pipeline(config)
  File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/tandem_to_rescore.py", line 55, in tandem_pipeline
    modification_mapping=modification_mapping
  File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/percolator.py", line 51, in __init__
    self.modification_mapping = modification_mapping
  File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/percolator.py", line 67, in modification_mapping
    mod_labels = [key[1] for key in value.keys()]
  File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/percolator.py", line 67, in <listcomp>
    mod_labels = [key[1] for key in value.keys()]
TypeError: 'float' object is not subscriptable
RalfG commented 3 years ago

Hi @nattzy94, Do you still have this issue in the latest version?

You can install the latest MS²ReScore version from PyPI: pip install ms2rescore[deeplc]