Closed nattzy94 closed 3 years ago
Hi! As I mentioned in https://github.com/compomics/ms2pip_c/issues/89, MS²ReScore is the way to go for your use case of MS²PIP!
One remark, though. You mentioned:
I then take the unmatched spectra and search it against a small protein database.
I would definitely not do this, as by (1) not including previously identified spectra in the second search and (2) searching against a smaller database, which does not match the expected search space of the sample, the statistics of your target-decoy search and subsequent false discovery rate (FDR) estimation are not going to add up. This will either lead more false negative identifications or to more false positive identifications than estimated.
I do recommend you to follow the Search-All, Asses Subset approach, where you would search all spectra against a concatenated database of UniProt AND the small proteins, and then estimate the FDR on the subset (small proteins) only. This will lead to an ideally estimated FDR for the small proteins you are searching for.
About MS²ReScore and the error you encountered: It seems like the tool cannot find a modification (57.02147
) in it's configuration. Do you have a line with carbamidomethyl in your config file (ms2pip
> modifications
section)?
{"name":"Carbamidomethyl", "unimod_accession":4, "mass_shift":57.02200, "amino_acid":"C", "n_term":false},
If you do, and still get the error, I would change the mass shift to match the value listed in the error:
{"name":"Carbamidomethyl", "unimod_accession":4, "mass_shift":57.02147, "amino_acid":"C", "n_term":false},
I do expect to release a significant update to MS²ReScore by the end of this week, which will include added features from DeepLC retention time prediction. So it might be worth to wait a few more days with your analysis until you can use the latest version of MS²ReScore.
Hi Ralf,
Thanks for the reply. The reason why I did a sub-sub approach was because the MS experiment did not include steps for small protein enrichment. So, in order to avoid likely false positives from an all-all or all-sub approach, I filtered against Uniprot proteins that I expect would form the majority.
Nevertheless thanks for bringing up alternative methods of protein searching and assessment. I never thought to look into such methods since MS is not my main field of study. It will definitely be interesting to implement these alternatives as my predictions are of pretty low quality right now.
On the main error issue, I followed your advice and changed my mass shift to match that of the error. However, I run this error message:
Traceback (most recent call last):
File "/home/e0470749/miniconda2/envs/new_ms2rescore/bin/ms2rescore", line 8, in <module>
sys.exit(main())
File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/ms2rescore/__init__.py", line 124, in main
peprec_filename, mgf_filename = tandem_to_rescore.tandem_pipeline(config)
File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/ms2rescore/tandem_to_rescore.py", line 152, in tandem_pipeline
make_pepfile(outname + "_edited.pin", config)
File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/ms2rescore/tandem_to_rescore.py", line 83, in make_pepfile
write_PEPREC(pepfile, path_to_pin)
File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/ms2rescore/tandem_to_rescore.py", line 96, in write_PEPREC
pepfile_tosave = pepfile.loc[:, ['TITLE', 'modifications', 'peptide', 'Charge']]
File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1762, in __getitem__
return self._getitem_tuple(key)
File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1289, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1954, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1595, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1552, in _get_listlike_indexer
self._validate_read_indexer(
File "/home/e0470749/miniconda2/envs/new_ms2rescore/lib/python3.8/site-packages/pandas/core/indexing.py", line 1654, in _validate_read_indexer
raise KeyError(
KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'
Do I have to install an older version of pandas for this to work? In the meantime, I will wait for the new ms2rescore update since it will be available very soon.
Thank you!
P.S. Sorry for all the edits, I was trying various things to troubleshoot.
Can you the version in https://github.com/compomics/ms2rescore/pull/12#issue-450077964? It's still in development, but a lot has changed since the version you are running. You can download the zip file linked in the post, unpack it, and install the python package (wheel) directly with pip install ms2rescore-0.3.0.dev1-py3-none-any.whl
.
For the configuration file, you can use something like this:
{
"general":{
"pipeline":"tandem",
"feature_sets":["all"],
"run_percolator":true,
"keep_tmp_files":false,
"show_progress_bar":true,
"num_cpu":-1
},
"ms2pip": {
"model": "HCD",
"frag_error": 0.02,
"modifications": [
{"name":"Glu->pyro-Glu", "unimod_accession":27, "mass_shift":-18.0153, "amino_acid":"E", "n_term":true},
{"name":"Gln->pyro-Glu", "unimod_accession":28, "mass_shift":-17.0305, "amino_acid":"Q", "n_term":true},
{"name":"Acetyl", "unimod_accession":1, "mass_shift":42.0367, "amino_acid":null, "n_term":true},
{"name":"Oxidation", "unimod_accession":35, "mass_shift":15.9994, "amino_acid":"M", "n_term":false},
{"name":"Carbamidomethyl", "unimod_accession":4, "mass_shift":57.0513, "amino_acid":"C", "n_term":false},
{"name":"Pyro-carbamidomethyl", "unimod_accession":26, "mass_shift":39.994915, "amino_acid":"C", "n_term":false},
{"name":"Deamidated", "unimod_accession":7, "mass_shift":0.984016, "amino_acid":"N", "n_term":false}
]
}
}
Thanks for the updated version. I tried it using ms2rescore -m /gpfs/eplab/Nathaniel/56h_non_validated_PSMs_concatenated.mgf -c /gpfs/eplab/Nathaniel/config_tandem.json /gpfs/eplab/Nathaniel/56h_non_validated_PSMs_concatenated.mgf_xtandem.xml
The config file was copied from the above comment. However, this gives the error message:
2020-07-17 16:05:54 - INFO - Using tandem pipeline
Pin-converter version 3.05.0, Build Date Jul 9 2020 20:45:05
Copyright (c) 2013 Lukas Käll. All rights reserved.
Written by Lukas Käll (lukas.kall@scilifelab.se) in the
School of Biotechnology at KTH - Royal Institute of Technology, Stockholm.
Issued command:
tandem2pin -P REVERSED /gpfs/eplab/Nathaniel/56h_non_validated_PSMs_concatenated.mgf_xtandem.xml
on compute-9-10
Reading /gpfs/eplab/Nathaniel/56h_non_validated_PSMs_concatenated.mgf_xtandem.xml
Traceback (most recent call last):
File "/home/e0470749/miniconda2/envs/ms2rescore_2/bin/ms2rescore", line 8, in <module>
sys.exit(main())
File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/__main__.py", line 8, in main
run()
File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/runner.py", line 57, in run
peprec_filename, mgf_filename = pipeline(config)
File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/tandem_to_rescore.py", line 55, in tandem_pipeline
modification_mapping=modification_mapping
File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/percolator.py", line 51, in __init__
self.modification_mapping = modification_mapping
File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/percolator.py", line 67, in modification_mapping
mod_labels = [key[1] for key in value.keys()]
File "/home/e0470749/miniconda2/envs/ms2rescore_2/lib/python3.7/site-packages/ms2rescore/percolator.py", line 67, in <listcomp>
mod_labels = [key[1] for key in value.keys()]
TypeError: 'float' object is not subscriptable
Hi @nattzy94, Do you still have this issue in the latest version?
You can install the latest MS²ReScore version from PyPI: pip install ms2rescore[deeplc]
I am trying to analyze my mass spec data for the presence of small proteins. I started by searching my mgf files against Uniprot annotated and reviewed proteins. I then take the unmatched spectra and search it against a small protein database. This reduces the chances of false positives arising.
I did the search using X!Tandem on R which outputs a
xml
file which I then use as input into ms2rescore using the following command:ms2rescore -m /gpfs/eplab/Nathaniel/ms-analysis/unmatched_spectra_from_uniprot_search/b014p014_P_56h_M5_rep1_non_validated_PSMs.mgf -c /gpfs/eplab/Nathaniel/ms2rescore-master/config_tandem.json /gpfs/eplab/Nathaniel/b014p014_P_56h_M5_rep1_non_validated_PSMs.mgf_xtandem.xml
However, the following error message comes up:
In addition, the following files are created after running ms2rescore:
Would be great if someone could point out where I've gone wrong.