MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
73 stars 36 forks source link

Number of peptides in E/p-value calculations #90

Open mrForce opened 4 years ago

mrForce commented 4 years ago

Describe the question or problem In MZIdentMLGen.java, the spectral E-value is multiplied by the number of unique peptides in the database with the same length as the peptide in the match, to get the final E-value. This makes sense. However, in DBScanner.java on line 826, the number of unique peptides in the database with the same length as the matching peptide is also used for the multiple testing correction when computing the p-value, as: p-value = 1 - (1 - specProb)^N, where N is the number of peptides in the database with the same length as the matching peptide. I would have thought that N should be the number of peptides we searched against the spectrum (i.e. the number of tests). I understand that this part of the code is from MSGFDB, and is likely very old, but I would find it helpful to know why this was done this way.

Details The reason I'm asking this is because my advisor and I are trying to see if we can modify MS-GF+ so that we can estimate FDR directly without using a decoy database. I guess this brings up a tangential question: it looks like MSGFDB had the ability to compute FDR's using the formulas found here, but this functionality doesn't appear in MS-GF+. Is there a reason for this?

alchemistmatt commented 4 years ago

Regarding the tangential question, I believe Sangtae Kim removed that option when he switched from p-values and FDR to E-values QValues. This update also included renaming the program from MSGFDB to MS-GF+. I do not know the details for this change.