Closed ypriverol closed 2 years ago
that would be nice and makes it easier to run.
currently, I run it as follows:
BLAST
)BLAST
output and generate a TSV containing only peptides with single amino acid changes spectrumAI
using the generated file to check validity of the changed position.Rscript SpectrumAI.R spemzml_dir psms_with_single_missmatch.tsv outputdir
@husensofteng :
From PXD008841
dataset, the following two peptides are identified that have one AA difference with canonical proteins, and validated by spectrumAI
as PASS and FAIL, respectively.
Pass: TIAECLADELINAAK (Canonical) TIAECLAEELINAAK (Variant) Spectra file: HJOSLO2U_20140703_TMTpool1_300ugIPG3-10_7of15ul_fr10.mzML
Fail: KAAAPTPEEEMDECEQALAAEPK (Variant) KAAAPAPEEEMDECEQALAAEPK (Canonical) Spectra file: HJOSLO2U_20140703_TMTpool1_300ugIPG3-10_7of15ul_fr08.mzML
spectrumAI (https://github.com/yafeng/SpectrumAI) is a tool that enables to detect the corresponding b and y ions for an specific mutation. The original algorithm was implemented in R but for better integration with the quantms pipeline and pypgatk would be great to have an implementation in python.
I suggest the following structure:
The commandline tool consume a file with the following format tsv:
canonical peptide
|variant peptide
|canonical aa
|variant aa
|position
|spectra file
|scan
Instead of using the code to generate the theoretical spectra, I suggest using the OpenMS function for that:
example:
refence: https://pyopenms.readthedocs.io/en/latest/theoreticalspectrumgenerator.html
@husensofteng can you provide an example in this format of a valid variant and a wrong variant including the mzML file.