MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
76 stars 36 forks source link

questions about Q-values in MSGF+ #114

Open Linkous-02 opened 3 years ago

Linkous-02 commented 3 years ago

I met some trouble when understanding the caculation of Q-value in MSGF+:

I am running MSGF+ on one sample of 10 fraction files, and everytime I get output of .tsv, I will delete those spectrums whose Q-values were lower than 0.01 and those PSMs who mapped to decoy entries(those spectrums were filterred out by Q-value were left), then I will run MSGF+ on those left spectrums again with same parameters and databses.

by the explanation of MSGF+ paper(" For a thresholdt, report the FDR as Ndecoy/Ntarget where Ntarget(Ndecoy) is the number of target (decoy) PSMs with spectral E-values equal or smaller than t"), ideally, every turn I run on those left spectrums, there will always some PSMs whose FDR < 0.01, but result is the Q-values of PSMs were all higher than 0.01 in the third turn.

So I wonder the caculation of Q-value by MSGF+ was slightly different from the formula in MSGF+ paper.

It will be very thankful if someone can provide me with some pointers.

alchemistmatt commented 3 years ago

Your method of analysis is something I have never seen applied, and is, frankly, a bit dubious. I don't think you can trust the Q-Values on the searches after you removed the high confidence spectra and decoy proteins. As for how Q-Values are computed, please see either of these two Excel files, which demonstrate how to manually compute the Q-Values. I suggest you take the results from each of your searches and manually compute Q-Values as shown in these files, and compare to what MS-GF+ is reporting. Admittedly, the Q-Values in the Excel files don't exactly match what MS-GF+ reports, but they're close.

Linkous-02 commented 3 years ago

Thanks for your reply, to point that, I run MSGF+ by this why because I am curious about these spectrums filtered by target-decoy strategy. And found the Q-value of output did not fit the formula in article by chance so I want to get some answer.

And I also compute Q-Values in my file which was attached below. But the manually computed Qvalues seems to be very different from the output.

It will be very thankful if you can give me som clue.

-- Qvalue

alchemistmatt commented 3 years ago

Those SpecEValues are fairly low (i.e. not good) and you have negative MSGFScore values. Something odd is going on. As I said earlier, you can only compute Q-values using the Reverse / Forward method when you search the entire, unfiltered original .mzML file.

MihirMongia commented 1 year ago

Hi Linkous-02 , did you ever happen to resolve this issue. I am a beginner using MSGF+ and I am also having trouble replicating the Q-values.