Difference R^2~5% in runs with deisotope to 0 and 1

Nesvilab / FragPipe

A cross-platform proteomics data analysis suite

http://fragpipe.nesvilab.org

Other

203 stars 38 forks source link

Difference R^2~5% in runs with deisotope to 0 and 1 #454

Closed animesh closed 3 years ago

animesh commented 3 years ago

I notice that the correlation/R^2 between three replicates run with deisotope=0 (results folder at https://drive.google.com/drive/folders/1RDHEK_9-MKys6Y0XvFSfP1wg1OeoZMfI?usp=sharing) and deisotope=1 (https://drive.google.com/drive/folders/1Rc-eZelMDw_4UM3x4Hwpx0MZWuRX-NaN?usp=sharing) is about ~0.95 thus i am wondering what is the best practice to follow?

dpolasky commented 3 years ago

Hi Ani, It is expected that deisotoping MS2 spectra will change the results to some extent as a result of identifying additional/different PSMs (it looks like your deisotoping runs are identifying about 5% more PSMs than without deisotoping, for example). I'm not sure what you're plotting (protein-level quant?) to look at the correlation, but I wouldn't be concerned with small differences in quant unless there appears to be a systematic bias of some kind. We generally recommend deisotoping since it improves search results. Dan

fcyu commented 3 years ago

Hi @animesh ,

When I was trying to check your result, I found that there are multiple logs with multiple config files. I guess you run the same data multiple times. To make it easier to inspect your result and less error prone, can you delete all unrelated files and make sure that the results are from the correct parameters?

Best,

Fengchao

animesh commented 3 years ago

Sorry for the mess @fcyu, for some strange reason it keeps crashing till it works? aniways i hopefully have managed clean runs (also added D=2, not sure what it is though?) and they are uploaded at https://drive.google.com/drive/folders/1TGxxdDLvk4vQWVgrZMmC5xnA7r4Um2l-?usp=sharing .

One thing i notice is those correlations (@dpolasky yes, these are protein SILAC quant ratio from fragpipe) have also changed a bit?

fcyu commented 3 years ago

Thanks for your files. I looked into the difference of D and D1. It seems that the difference is mostly due to difference peptides assigned to the proteins. Taking the most extreme case for example: sp|Q5QNW6|H2B2F_HUMAN 0.065533277 -5.165024683. After checking the ion_label_quant.tsv files, three peptides ESYSVYVYK, KESYSVYVYK, SRKESYSVYVYK were assigned to different proteins, which causes the big difference in the log-ratio. I guess the peptide-level and ion-level scatter plots will show higher correlations.

Best,

Fengchao

fcyu commented 3 years ago

Also, I don't think the scatter plot of different quantification analysis is very meaningful because of the normalization and peptide-to-protein rolling (MaxLFQ) algorithms. Normalization and MaxLFQ adjust the intensity according to the information of the current analysis. Different parameters result in different adjustment, which makes the intensities not comparable. That is why we normally perform normalization and peptide-to-protein rolling with all runs together.

Best,

Fengchao

animesh commented 3 years ago

Thanks, @fcyu for looking into this 👍🏽 I am wondering what is D=2 and what would be your recommendation for D in a timsTOFpro SILAC analysis? Also if you can comment on the search setup in general?

fcyu commented 3 years ago

D=2 means that the peaks failed in deisotoping can have fragment charge 1 and 2. D=1 means that the peaks failed in deisotoping can have fragment charge 1. In most cases, D=1 is better than D=2 and D=0. I also suggest you use D=1 for the timsTOF SILAC analysis.

Best,

Fengchao