Nesvilab / MSFragger

Ultrafast, comprehensive peptide identification for mass spectrometry–based proteomics
https://msfragger.nesvilab.org
106 stars 7 forks source link

Differences in quans & identified peptides with MSFragger in PD-node vs FragPipe #274

Closed FriedLabJHU closed 1 year ago

FriedLabJHU commented 1 year ago

Hello,

We are experiencing a significant coverage issue when using MSFragger in FragPipe relative to PD. The largest difference is in the total number of identified peptides, and the second are the quans reported for similarly identified peptides between FragPipe and PD.

We are running LFQs between 2 experimental conditions (R and N) with 3 replicates each: Using the same raw files, fasta, and msfragger parameters – we identified 29,000 peptides for ~1400 protein in PD but only 11,000 peptides for ~1300 proteins in FragPipe. When comparing the volcano plots of the peptides, we notice that the FragPipe quans are much larger than reported in PD (see picture below).

Have you experienced this before? If so, do you have any recommendations on the possible source for these discrepancies?

msf_pd_fp

anesvi commented 1 year ago

It's the same MSFragger in PD and FragPipe. Did you filter PD results to 1% FDR at the peptide and protein level? Also, are you sure you counting peptides and not PSMs in PD? 29k vs 11k, I have no idea how the post-processing (after MSFragger) in PD can give so much more. As far as the MS1 abundances, I do not know what PD is doing, it's a black box, and I not know many people who use MS1 quant in PD.

anesvi commented 1 year ago

Also, what output file from FragPipe are using? Perhaps Fengchao can take a look at your files

fcyu commented 1 year ago

Hi @FriedLabJHU ,

Could you share your log files? You need to upload it in this GitHub issue. Replying the email will truncate the files.

Best,

Fengchao

FriedLabJHU commented 1 year ago

@anesvi We are counting peptides. In our analysis to produce the volcano plots, we are taking the abundance of both conditions, taking the ratio for every ion (consensus feature in PD), then taking the median of all the ratios. We do this on a per-peptide basis. Here is a link to a OneDrive Folder with the outputs we are using from PD and FragPipe.

FriedLabJHU commented 1 year ago

Here are the final outputs we obtain from the analysis. Results.xlsx

FriedLabJHU commented 1 year ago

After all our analysis, this is what is selected as "significant" peptides. pd_2way_peptides.csv fp_2way_3.2_peptides.csv fp_2way_peptides.csv

anesvi commented 1 year ago

Fengchao is on vacation, so may not respond or look at the data for a while. Did you check if the peptides were identified in FragPipe, but quantified in one condition? Then you would not have a ratio and would not count. If so, we need to check why they were not quantified by IonQuant. But we have stringent criteria for feature detection (at least 2 isotopes at least 3 points) and strictly controlled MBR. I think in PD they have far more liberal filters, certainly for MBR. But we need to take a look at the data, when we have time, this is just guessing.

FriedLabJHU commented 1 year ago

Certainly, I complete understand. One other thing I notice is that there are quans in PD where FragPipe shows no match, so I agree I think PD is more liberal in calling PSMs. Downstream, the results don't change dramatically, I was just curious as to why the quan magnitudes are so different.

Thank you both! Look forward to your relies later on.

fcyu commented 1 year ago

Hi @FriedLabJHU ,

Thanks for the shared files. It seems that FragPipe and PD were use different sets of files. FragPipe has 6 files but PD has 24 files....

FragPipe

  Experiment/Group: Native_LiP_1
  - Z:\friedlab\EM_DATA\FRAPPIPE_ANAYLSIS\HMM_TEST\data\20220109_HMM_ThermusRefolding_NL1.raw   DDA
  Experiment/Group: Native_LiP_2
  - Z:\friedlab\EM_DATA\FRAPPIPE_ANAYLSIS\HMM_TEST\data\20220109_HMM_ThermusRefolding_NL2.raw   DDA
  Experiment/Group: Native_LiP_3
  - Z:\friedlab\EM_DATA\FRAPPIPE_ANAYLSIS\HMM_TEST\data\20220109_HMM_ThermusRefolding_NL3.raw   DDA
  Experiment/Group: Refolded_LiP_1_min_1
  - Z:\friedlab\EM_DATA\FRAPPIPE_ANAYLSIS\HMM_TEST\data\20220109_HMM_ThermusRefolding_R1_1min.raw   DDA
  Experiment/Group: Refolded_LiP_1_min_2
  - Z:\friedlab\EM_DATA\FRAPPIPE_ANAYLSIS\HMM_TEST\data\20220109_HMM_ThermusRefolding_R2_1min.raw   DDA
  Experiment/Group: Refolded_LiP_1_min_3
  - Z:\friedlab\EM_DATA\FRAPPIPE_ANAYLSIS\HMM_TEST\data\20220109_HMM_ThermusRefolding_R3_1min.raw   DDA

PD: image

Could you double check that you shared the correct PD and FragPipe result files? If so, you need to re-run FragPipe with the same set of input files, and closed parameters.

Also, note that PD's FDR filtering is quite liberal.....

Best,

Fengchao

FriedLabJHU commented 1 year ago

The files are the same. We only used the NL and RX_1min in the PD run, both ran with only those 6 files. We upload all files into the PD file system but only used a subset of the files in the analysis. If you see the .csv in the PD folder on the OneDrive, the consensus features only contain the files associated with NL and RX_1min.

fcyu commented 1 year ago

The 20220109_HMM_ThermusRefolding_LFQ_R1min.csv seems to be messed up. It is hard to read:

image

Could you re-export and share the peptide-level report with me?

Thanks,

Fengchao

FriedLabJHU commented 1 year ago

Unfortunately, this is the only way for PD to output quans & identifying which peptides and proteins they come from.

FriedLabJHU commented 1 year ago

The core issue we are facing is that PD is reporting non-zero intensities whereas FragPipe reports zeros for the same peptide ion.

fcyu commented 1 year ago

I apologize if I did not explained it clearly. Your csv files exported from pd was corrupted/unreadable. Please see the following screenshot: the lines were broken in the middle and messed up image

I had some PD results before and they look good. For example in the following screenshot, each row was formatted well image

If opening using Excel, there will be also well formatted: image

You need to re-export the PD results. If you are not sure how to do that, you probability need some help from your lab...

BTW, I am curious how you could get the meaningful information from such a messed up csv files?

Could you also share the PD's log and parameter files with us? Without those files and the result file, it is hard to tell if you performed a fair comparison.

Best,

Fengchao