Nesvilab / FragPipe

A cross-platform Graphical User Interface (GUI) for running MSFragger and Philosopher - powered pipeline for comprehensive analysis of shotgun proteomics data
http://fragpipe.nesvilab.org
Other
179 stars 37 forks source link

TMT quantification results and data processing #1335

Closed Me9atron closed 7 months ago

Me9atron commented 8 months ago

I finished my TMT-16 analysis using FragPipe v.20, I am very stunned by the dramatically increased number of quantified peptides, but I am a bit puzzled about the different number of significant proteins determined by TMT quantification results between FragPipe and MaxQuant. Here is a summary to the results and data processing. Using the same dataset and database (FASTA), I set the searching parameters in FragPipe as almost same as MaxQuant. Using TMT MS3, MaxQuant gave 69477 quantified peptides and 6628 quantified proteins, and FragPipe produced 82242 quantified peptides and 7032 quantified proteins, with significantly increased peptides ID, FragPipe had nearly the equally good performance in cv distribution as MaxQuant ( 86% proteins have cv lower than 20%).

First we used the peptides quantification results to do the normalization by dividing each peptide reporter ion abundance by the sum of all peptides reporter ions abundance in the same channel. Then we calculated the median of all peptides' normalized values based on protein ID to obtain the quantification results on protein level. After calculating the Bonferroni-adjusted p value and fold change between two groups (drug treated vs. untreated), we determined the significant proteins via volcano plots. What puzzles me most is the number of significant proteins is much smaller for FragPipe results than for MaxQuant results (20 vs. 100).

For FragPipe analysis, I used the "abundance_peptide_none_tsv" file in folder "tmt-report" for getting the results mentioned above (peptide.tsv file gave the same results), but the abundance_peptide_MD_tsv file gave a much worse result. I also tried to uncheck the "outlier removal" but this made no improvements. I think about if the TMT isotope correction factors make such effect, because there is no settings for write these factors, would this be possible? Could you give me some ideas or suggestions for figuring out why my results of significant proteins are so different between FragPipe and MaxQuant? Thank you very much.

Best wishes,

Zhaowei

anesvi commented 8 months ago

You are doing something wrong with normalization. Please first use abundance_gene_MD file which is already normalized. This is what we always use and it works great

Me9atron commented 8 months ago

Hello Alexey,

Thank you for your advice. I tried to use the abundance_gene_MD file without doing normalization on my side, unfortunately its results was identically bad to using abundance_peptide_MD_tsv with the normalization that I described above. The results with using abundance_peptide_none_tsv is better than that with abundance_peptide_MD_tsv and abundance_gene_MD.

Up to now, after some benchmarking evaluation, FragPipe gave equally good or even better results, in particular the semi tryptic peptides identification (this was done in a separate search) has been boosted by 2-3 fold compared to MaxQuant, which may be because MaxQuant does not allow missed cleavage larger than 0 in semi searching settings. However, our only problem with FragPipe is still the much smaller number of significant proteins than that with MaxQuant (20 vs. 100), this really puzzled us most. The method of calculating or determining the TMT reporter ions abundance is different between softwares, which will affect the TMT quantification results to some extent. For which I read one article about a comparison between FragPipe and PD in the attachment, both softwares have a consistency in quantified proteins and other evaluation aspects, but they did show the noticeable difference in the results of differentially expressed proteins (DEPs).

I do not know much about algorithm which relates to determining TMT reporter ion abundance, but I am deeply intrigued by this phenomenon, and hope to get understood of the reasons. As the experts in this area, could you help give some insights or ideas about interpreting and dealing with the TMT-quantification related difference in the results of differentially expressed proteins between FragPipe and MaxQuant for my case? Thank you.

Best wishes,

Zhaowei he-et-al-2022-comparative-evaluation-of-proteome-discoverer-and-fragpipe-for-the-tmt-based-proteome-quantification.pdf

anesvi commented 7 months ago

Unfortunately I do not know what to tell you. I have never seen a dataset where different software (PD, MaxQuant, FragPipe) would give drastically different differential expression results. Perhaps there is something in how you post-process the data. I suggest you follow our standard workflows, take abundance_gene_MD file, and use FragPipe-Analyst to do differential expression and pathway analysis. We know it works well because we tested it on many datasets. Using _None (no normalization) sometime can be used too if the sample amounts in each channel were not the same on purpose.