Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
184 stars 37 forks source link

Open search returns non-tryptic peptides because Crystal-C adjusted a tryptic one to a semi-tryptic one. #1279

Closed pisistrato closed 11 months ago

pisistrato commented 11 months ago

I am trying to run a simple open search, using default settings. What I see is that a lot of the peptides returned in the table are not fully tryptic.

This is taken from the interact.pep.xml

<search_result>
<search_hit peptide="PIDFLEAK" massdiff="-0.00161" calc_neutral_pep_mass="931.50137" peptide_next_aa="G" num_missed_cleavages="0" num_tol_term="1" protein_descr="MYPROTEIN OS=MYO GN=MYP PE=1 SV=1" num_tot_proteins="1" tot_num_ions="16" hit_rank="1" num_matched_ions="10" protein="sp|X00XX0|MAYP1_MYO" peptide_prev_aa="N" is_rejected="0">
<search_score name="hyperscore" value="19.4"/>
<search_score name="nextscore" value="15.573"/>
<search_score name="expect" value="1.106497e-04"/>
<ptm_result localization="1" best_score_with_ptm="19.400997" score_without_ptm="13.815165" localization_peptide="pIDFLEAK" second_best_score_with_ptm="17.234322" ptm_mass="-114.04468"/>
<analysis_result analysis="peptideprophet">
<peptideprophet_result probability="0.9997" all_ntt_prob="(0.0000,0.9997,0.9980)">
<search_score_summary>
<parameter name="fval" value="7.8319"/>
<parameter name="ntt" value="1"/>
<parameter name="nmc" value="0"/>
<parameter name="massd" value="-0.002"/>
</search_score_summary>
</peptideprophet_result>
</analysis_result>
</search_hit>
</search_result>

I checked the database (including the reverse sequences), and peptide PIDFLEAK can only map to X00XX0, but the previous amino acid is N (--TIMERSSFEKNPIDFLEAK--).

Why this is happening? Am I missing something here?

fcyu commented 11 months ago

Could you share your log file, fasta file, mzML file, and the interact.pep.xml file with us?

Thanks,

Fengchao

pisistrato commented 11 months ago

Hi @fcyu You can get it from here https://syncandshare.lrz.de/getlink/fiSVzs4hxWfckRFRnyUACU/

fcyu commented 11 months ago

Somehow I couldn't download the files. Could you upload the files to https://www.dropbox.com/request/flObzxZqiLXxLWwFJVZk

Thanks,

Fengchao

pisistrato commented 11 months ago

Done, thanks!

fcyu commented 11 months ago

Could you also upload the fasta file?

pisistrato commented 11 months ago

I thought I did, sorry. It is now there.

fcyu commented 11 months ago

I think I might find out the reason. To confirm my hypothesis, could you upload your *.pepXML files (the ones without interact- prefix)?

pisistrato commented 11 months ago

Great. The file should be there now.

fcyu commented 11 months ago

Thanks for your prompt response, and the files.

I have confirmed that it was due to Crystal-C. Taking your PSM as an example, the following is from MSFragger (in Exp1_0268_R0095-01_S004226_G_B01_LFQ_02.pepXML)

<spectrum_query start_scan="16040" uncalibrated_precursor_neutral_mass="931.5009" assumed_charge="2" spectrum="Exp1_0268_R0095-01_S004226_G_B01_LFQ_02.16040.16040.2" spectrumNativeID="controllerType=0 controllerNumber=1 scan=16040" end_scan="16040" index="11979" precursor_neutral_mass="931.49976" retention_time_sec="1512.6200866699219">
<search_result>
<search_hit peptide="NPIDFLEAK" massdiff="-114.0445556640625" calc_neutral_pep_mass="1045.5443" peptide_next_aa="G" num_missed_cleavages="0" num_tol_term="2" protein_descr="CRISPR-associated endonuclease Cas9/Csn1 OS=Streptococcus pyogenes serotype M1 OX=301447 GN=cas9 PE=1 SV=1" num_tot_proteins="1" tot_num_ions="16" hit_rank="1" num_matched_ions="10" protein="sp|Q99ZW2|CAS9_STRP1" peptide_prev_aa="K" is_rejected="0">
<search_score name="hyperscore" value="19.4"/>
<search_score name="nextscore" value="15.573"/>
<search_score name="expect" value="1.106497e-04"/>
<ptm_result localization="1_2" best_score_with_ptm="19.400997" score_without_ptm="13.815165" localization_peptide="npIDFLEAK" second_best_score_with_ptm="17.234322" ptm_mass="-114.04468"/>
</search_hit>
</search_result>
</spectrum_query>

Then, Crystal-C changed the peptide to (in Exp1_0268_R0095-01_S004226_G_B01_LFQ_02_c.pepXML)

<spectrum_query start_scan="16040" uncalibrated_precursor_neutral_mass="931.5009" assumed_charge="2" spectrum="Exp1_0268_R0095-01_S004226_G_B01_LFQ_02.16040.16040.2" end_scan="16040" index="11979" precursor_neutral_mass="931.49976" retention_time_sec="1512.6200866699219">
<search_result>
<search_hit peptide="PIDFLEAK" massdiff="-0.00161" calc_neutral_pep_mass="931.50137" peptide_next_aa="G" num_missed_cleavages="0" num_tol_term="1" protein_descr="CRISPR-associated endonuclease Cas9/Csn1 OS=Streptococcus pyogenes serotype M1 OX=301447 GN=cas9 PE=1 SV=1" num_tot_proteins="1" tot_num_ions="16" hit_rank="1" num_matched_ions="10" protein="sp|Q99ZW2|CAS9_STRP1" peptide_prev_aa="N" is_rejected="0">
<search_score name="hyperscore" value="19.4"/>
<search_score name="nextscore" value="15.573"/>
<search_score name="expect" value="1.106497e-04"/>
<ptm_result localization="1" best_score_with_ptm="19.400997" score_without_ptm="13.815165" localization_peptide="pIDFLEAK" second_best_score_with_ptm="17.234322" ptm_mass="-114.04468"/>
</search_hit>
</search_result>
</spectrum_query>

That is why you saw PIDFLEAK in the interact.pep.xml file. The logic of changing NPIDFLEAK to PIDFLEAK is that 1) there are always a small fraction of semi-tryptic peptides in the sample; 2) there is a big delta mass which equals to the mass of N. After removing N, the mass diff became -0.00161.

Best,

Fengchao

pisistrato commented 11 months ago

I see, but why not reporting NPIDFLEAK with a modification (loss of N)? That would be better way of reporting the identification IMHO.

fcyu commented 11 months ago

That is because matching to PIDFLEAK (with ~0 delta mass and similar score) is more reasonable than NPIDFLEAK. The "loss of N" is likely not true.

Best,

Fengchao

anesvi commented 11 months ago

Semi-tryptic peptides (e.g. due to in-source fragmentation) are very common and not relevant to the main purpose of doing open PTM searches. See our Crystal-c manuscript. But if you do not want to reclassify such peptides, just uncheck Crystal-c