Open mrForce opened 4 years ago
Regarding the tangential question, I believe Sangtae Kim removed that option when he switched from p-values and FDR to E-values QValues. This update also included renaming the program from MSGFDB to MS-GF+. I do not know the details for this change.
Describe the question or problem In MZIdentMLGen.java, the spectral E-value is multiplied by the number of unique peptides in the database with the same length as the peptide in the match, to get the final E-value. This makes sense. However, in DBScanner.java on line 826, the number of unique peptides in the database with the same length as the matching peptide is also used for the multiple testing correction when computing the p-value, as: p-value = 1 - (1 - specProb)^N, where N is the number of peptides in the database with the same length as the matching peptide. I would have thought that N should be the number of peptides we searched against the spectrum (i.e. the number of tests). I understand that this part of the code is from MSGFDB, and is likely very old, but I would find it helpful to know why this was done this way.
Details The reason I'm asking this is because my advisor and I are trying to see if we can modify MS-GF+ so that we can estimate FDR directly without using a decoy database. I guess this brings up a tangential question: it looks like MSGFDB had the ability to compute FDR's using the formulas found here, but this functionality doesn't appear in MS-GF+. Is there a reason for this?