Nesvilab / philosopher

PeptideProphet, PTMProphet, ProteinProphet, iProphet, Abacus, and FDR filtering
https://philosopher.nesvilab.org
GNU General Public License v3.0
109 stars 17 forks source link

Interpretation of report tables #13

Closed tomthun closed 6 years ago

tomthun commented 6 years ago

Dear Dev-Team, I am currently working with the reports produced by philosopher.exe report, that I obtained from MSFragger GUI. When the reports are finished, there are four tables produced for open and closed search: report.tsv, peptide.tsv, psm.tsv, and modifications.tsv or ion.tsv. I didn't understand how to obtain a list of identified protein groups, since the Protein ID column in report.tsv contains only unique id's, but the Indistinguishable Proteins column is empty. Thanks, Tom Is there more documentation available on the reports? I only found https://prvst.github.io/philosopher/report.html?

prvst commented 6 years ago

@tomthun

Soon we will have more information about the tables on the website, because we are still working on their content, we decided to wait a little bit more.

Can you tell me what's the version you are running and what kind of data base you have ?

tomthun commented 6 years ago

@prvst I am using the latest Version of MSFragger, MSFragger GUI and Philosopher Version: https://github.com/prvst/philosopher/releases/tag/20180130

Right now i am using the HEK Data: http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD001468 and Human Database created with Decoys form Philosopher: http://www.uniprot.org/proteomes/UP000005640 (Link without Decoys)

For a better understanding: I am so interested in the output, because i am comparing MSFragger+Philosopher with MaxQuant. Especially open searching in MSFragger is exciting as a comparison to dependent peptide search in MQ. Therefore i would appreciate if any documentation becomes available for interpretation. E.g. grouping of Proteins. Right now i am just guessing by column labels and common sense. (same index = same group)

Thanks for your efforts!

prvst commented 6 years ago

Proteins are inferred and grouped according to the logic implemented in ProteinProphet, you might want to check some of Alexey's papers describing how the program works [1] [2]. If you feel comfortable, you might also take a look at your protXML file, there you will observe how the protein groups were formed and organized. Since we are not the official maintainers of ProteinProphet, I'm not planning to add any documentation on how the software works, sorry for that.

Regarding the results you see there; The indistinguishable list represents only proteins that cannot be differentiated between each other, that's why you don't see many of them. This is specially common if you are using a database like RefSeq or Swiss-Prot. If you add isoforms or variants to it, then you will see more cases like this. It is also important to note that this is different from proteins that share peptides because of a common region.

Please feel free to reopen the issue if you still have other concerns or questions.