Questions in outputing peptide report at peptide 1% FDR

Fiona-dym commented 8 years ago

Can you give a brief introduction about how PeptideShaker works on filtering peptides at a certain peptide FDR?

I have done a test using Mascot as the search engine, then use PeptideShaker to finish the post-processing step. I focus on the result of peptide report at peptide 1% FDR.

It is very strange that some peptides with high PSM scores and a good e-value are still missing in the peptide report. For example: sequence: TTVLAMDQVPR, ion score = 54.89, identity score = 24, e-value = 4.28128294901141e-05. Why is this peptide missing in the final peptide report? Do you have any ideas?

mvaudel commented 8 years ago

Hi,

Peptides are filtered according to the import filters, these can be edited in the advanced settings of the identification parameters. If you are working with Mascot only, it is important to provide PeptideShaker with the search settings used, otherwise modifications and mass tolerances will not be adapted adequately and peptides will be filtered out accordingly.

Hope this helps, if you can provide more information on the dataset and how it was search we will be happy to look into it more in details.

Marc

Fiona-dym commented 8 years ago

Hi Marc,

Thank you for your reply! I use the commands below to perform PeptideShaker, you can have a look.

Besides, I have put all corresponding files in my dropbox web, hope you can take a look of this data. (cpsx file: https://www.dropbox.com/s/c2cxhmgl2kku3or/Delete_peaks-iTRAQ8-111-mouse_8plex.Deisotope-net.new-MASCOT.cpsx?dl=0; mgf : https://www.dropbox.com/s/tc0g751nr8idjwt/Delete_peaks-iTRAQ8-111-mouse_8plex.Deisotope-net.new.mgf?dl=0; peptide report : https://www.dropbox.com/s/0596n1uaqxbiyxt/MASCOT_Delete_peaks-iTRAQ8-111-mouse_8plex.Deisotope-net.new_mouse_1_Default_Peptide_Report.txt?dl=0; database : https://www.dropbox.com/s/hfop9w7bce9gk16/swissprot_mouse_20160112target_decoy.fasta?dl=0; Missing peptide : https://www.dropbox.com/s/largv0pyeuebnr0/Missing_peptide.txt?dl=0 )

The file "Missing_peptide.txt" collects 20 unique peptide with high ion scores which recorded in PSM report at PSM 1% FDR but dismissed from peptide report at peptide 1% FDR. Hope you can have a check. Thanks!

Mascot searching parameters: Enzyme: Trypsin Fixed modifications: Carbamidomethyl (C),iTRAQ8plex (K),iTRAQ8plex (N-term) Variable modifications: Oxidation (M),iTRAQ8plex (Y) Peptide Mass Tolerance: 10 ppm Fragment Mass Tolerance: 0.05 Da Quantitation: None Instrument type: Default Decoy: 0 Mass values: Monoisotopic Database: autoDecoy_mouse

java -Djava.awt.headless=true -cp ./SearchGUI-2.6.5/SearchGUI-2.6.5.jar eu.isas.searchgui.cmd.IdentificationParametersCLI -out search_new.par -variable_mods "iTRAQ 8-plex of Y,Oxidation of M" -fixed_mods "Carbamidomethylation of C,iTRAQ 8-plex of K,iTRAQ 8-plex of peptide N-term" -db ./swissprot_mouse_20160112target_decoy.fasta -prec_tol 10 -frag_tol 0.05 -max_charge 5 -max_isotope 1 -useGeneMapping 0 -updateGeneMapping 0 -annotation_level 0.02 -import_peptide_length_min 7 -import_peptide_length_max 45 -ptm_score 2 -score_neutral_losses 0 -db_pi ./swissprot_mouse_20160112target_decoy.fasta -psm_fdr 1 -peptide_fdr 1 -protein_fdr 1

java -Xmx8G -cp ./PeptideShaker-1.7.6/PeptideShaker-1.7.6.jar eu.isas.peptideshaker.cmd.PeptideShakerCLI -experiment MASCOT_XXX -sample mouse -replicate 1 -identification_files ./XXX.asc.dat -out ./XXX-MASCOT.cpsx -spectrum_files ./XXX.mgf -id_params search_new.par -temp_folder tmp/

java -Xmx8G -cp ./PeptideShaker-1.7.6/PeptideShaker-1.7.6.jar eu.isas.peptideshaker.cmd.ReportCLI -in ./XXX-MASCOT.cpsx -out_reports output -temp_folder export_tmp/ -reports 3,5,7,8

mvaudel commented 8 years ago

Hi again and thank you for these files,

If you open the project in the interface you will see that your peptides are here but did not pass the validation threshold (red cross in the validation column). These peptides are good scoring, and in fact pass the 1% FDR limit at the PSM level. But since there are less peptides than PSMs, a 1% FDR is stricter at the peptide level and those good PSM do not make it to your final list.

In order to "rescue" them, you can opt for another validation threshold in the validation tab, but I am not sure whether it is a solution. The problem is actually that you have very high scoring decoy hits - and thus most likely false positives. You can decrease the prevalence of false positives by tuning the search settings - e.g. you could increase your ms2 m/z stringency as the fragment ion deviation is very low on your dataset, and decrease the number of allowed missed cleavages. You have quite a high ms1 deviation so you could try to recalibrate your spectra and search them again with more stringent tolerances. You might also want to try using different search engines: if multiple algorithms agree on the identification of your peptide it will increase your confidence in the identification.

Hope this helps,

Marc

mvaudel commented 8 years ago

This might not be related but your database did not include standard contaminants and trypsin. Adding these sequences to your fasta file will decrease the risk of false positives. You will find more information on this in our tutorials, see chapter 1.1 for the database creation: https://compomics.com/bioinformatics-for-proteomics/identification/

hbarsnes commented 8 years ago

Issue assumed resolved. If this is not the case, please let us know and we'll reopen the issue.

compomics / peptide-shaker

Questions in outputing peptide report at peptide 1% FDR #170