Nesvilab / FragPipe

A cross-platform Graphical User Interface (GUI) for running MSFragger and Philosopher - powered pipeline for comprehensive analysis of shotgun proteomics data
http://fragpipe.nesvilab.org
Other
179 stars 37 forks source link

How to use the result for next analysis? #1341

Closed yangxinzhi closed 8 months ago

yangxinzhi commented 8 months ago

Hello, when I searched the database using the default LFQ template with fragpipe and got the result of the combined protein, I wanted to ask the difference between the sample_Intensity and sample_MaxLFQ Intensity in the result file. Which one should I use for statistical analysis (which one is the result after normalization). Then when I select the right data, I need Keep each protein needs to be expressed in more than half of the samples. After filtering is complete, log is taken and random forest fill is performed for missing values. This is the step of my data analysis. I would like to ask if this analysis is OK?

(If a log file hasn't been generated, go to the 'Run' tab in FragPipe, click 'Export Log', zip the resulting "log_[date_time].txt" file to avoid truncation, then attach the zipped file by drag & drop here.)

fcyu commented 8 months ago

I wanted to ask the difference between the sample_Intensity and sample_MaxLFQ Intensity in the result file. Which one should I use for statistical analysis (which one is the result after normalization).

The sample_Intensity is from the top-N peptides, and the sample_MaxLFQ is from the MaxLFQ algorithm. Both are normalized. In most cases, you should use sample_MaxLFQ.

Then when I select the right data, I need Keep each protein needs to be expressed in more than half of the samples. After filtering is complete, log is taken and random forest fill is performed for missing values. This is the step of my data analysis. I would like to ask if this analysis is OK?

We actually have FragPipe-Analyst for the downstream analysis: http://fragpipe-analyst.nesvilab.org/ It can take the combined_protein.tsv from the LFQ-MBR workflow and perform routine analysis.

Best,

Fengchao

yangxinzhi commented 8 months ago

Thank you very much for that nice answer. I'll try it rightly!

yangxinzhi commented 8 months ago

By the way, I remember that I have another small problem. I also searched the database with PD for this data (480 was used to collect 90min of LFQ), but I found that for the same data, I could get about 4000 proteins by searching the database with PD. However, after fragpipe filtration is completed (using MaxLFQ), only about 2000 protein can be obtained. [log_2023-11-18_15-35-45.txt](https://github.com/Nesvilab/FragPipe/files/13405603/log_2023-11-18_15-35-45.txt)

fcyu commented 8 months ago

There are 3360 proteins identified: INFO[15:15:24] Converged to 0.98 % FDR with 3360 Proteins decoy=33 threshold=0.9787 total=3393. The proteins were filtered with global 1% PSM-level and protein-level FDR. When perform MaxLFQ, the filter is quite stringent. Not all proteins have quant value.

As to PD, the default FDR is 5%, not 1% unless you changed the settings.

Best,

Fengchao

yangxinzhi commented 8 months ago

You mean because fragpipe's LFQ template uses 0.01 for FDR correction, I would like to ask is it the --sequential --prot 0.01 in my screenshot here or MBM 0.01 in Q

452cd31bb9b0c4d8c8218e653bc12ec 4088dc3eccf006620a7a7343645b940

uant?

fcyu commented 8 months ago

Yes, also the MaxLFQ min ions (it affects the number of non-zero Protein intensities a lot), min scans, and min isotopes.

Best,

Fengchao

yangxinzhi commented 8 months ago

Sorry, I might have a problem with what I said above. I would like to ask if I want to change the FDR to 0.05, do I need to change the MBM ion FDR under the Quant module to 0.05? Or do I just need to change --sequential --prot 0.01 to 0.05?

86d1a18a72b262877480dec81f6f0df
fcyu commented 8 months ago

I don't think you should change the FDR threshold. What PD uses is too liberal. But you can change the MaxLFQ min ions to 1.

Best,

Fengchao