Xcorr score for PSMs using ReportCLI

nattzy94 commented 3 years ago

Hi,

Is it possible to include the XCorr score in the PSM report generated by ReportCLI? I can't seem to find a way. Alternatively, other scores like PSM p-value or e-value would also work. I would like to compare if PSMs may get higher scores depending on the protein database in use.

mvaudel commented 3 years ago

Hi,

Thank you for contacting us about this. I am afraid we don't have an easy solution at the moment, apologies for this and the lack of documentation. Below are ways to extract the data that you need and considerations on scores.

Getting XCorr/p-value/e-value We currently do not compute these, we just take the output of a search engine. You can therefore only export the score that has been used to rank peptides - typically the p-value provided by the search engine. For your specific question, I would recommend extracting the score of interest from the search engine results directly in the environment that you use for data interpretation (bioconductor should have parsers for R, and pyteomics for python).
Extend reports to include the score If you want to create a new report. The easiest is to do it via the user interface of the tool via the Export -> Identification Features menu, and then click "Add new report type". The feature you are looking for is called "Algorithm Raw Score" under "Identification Algorithm Results". This will export the score/p-value/e-value provided by the algorithm. Note that what the score represents is dependent on the search engine. Once saved, you can use your custom report from the ReportCLI. If you want to use this report on another computer, your report is saved under ~/.peptideshaker/exportFactory.json, you can pass this file to the ReportCLI using the -peptideshaker_exports command. I am sorry that this is so convoluted and poorly documented, we are working on improving the portability of reports.
Score dependence on the database By definition the match between a peptide sequence + mods and a spectrum, e.g. a correlation, does not depend on the database. Increasing the size of the database will however increase the likelihood that an incorrect match scores close, equally, or even better than the correct sequence. How well the match between your peptide and spectrum fares compared to all other peptides, which is typically what p/e-values attempt to capture, thus depends on the size of the database. We looked into how much of a difference this makes in a review a while ago, see figure 4 in https://doi.org/10.1002/mas.21543, hopefully it will be helpful.

nattzy94 commented 3 years ago

Thanks for the thorough reply and helpful links. I am learning a lot from you guys.

Once saved, you can use your custom report from the ReportCLI. If you want to use this report on another computer, your report is saved under ~/.peptideshaker/exportFactory.json, you can pass this file to the ReportCLI using the -peptideshaker_exports command

What I get from this is that I can export a custom report containing the search engine score using the GUI. And then use the resulting ~/.peptideshaker/exportFactory.json file as a template to generate the same custom report for other searches?

hbarsnes commented 3 years ago

What I get from this is that I can export a custom report containing the search engine score using the GUI. And then use the resulting ~/.peptideshaker/exportFactory.json file as a template to generate the same custom report for other searches?

Correct. You create and save the custom report via the GUI and then reuse the exportFactory.json file when wanting to make the same reports via the command line.

To see the report numbers of your custom reports, simply run the ReportCLI command line without any input options.

nattzy94 commented 2 years ago

Hi,

I tried to extend the reports as you described and managed to get the custom report I wanted. However, I do not have a exportFactory.json. I only have exportFactory.cus file in ~/.peptideshaker folder (I am using macOS). Are these equivalent?

I would also like to obtain the computed FDR for each peptide. Is its possible to do so by extending the reports as previously described?

mvaudel commented 2 years ago

Hi, I am afraid the exportFactory.cus is an older version, you would need the .json file. Please make sure that you are using the latest version of PeptideShaker. Note that if you are using the temp folder version, the files will be saved in the temp folder. I am afraid we don't provide the FDR at each score, but it can be easily computed, either by counting the decoys above a given score or by summing the PEP (ie 1-confidence). Hope it helps, Marc

nattzy94 commented 2 years ago

Hi,

I managed to get the algorithm score report by extending the score reports as previously suggeseted. I would like to check if my method to obtain the FDR is correct. This is as follows:

Order dataframe by search engine score. In my case, it is e-value (smaller is better) as I am using X!Tandem.
For row/peptide i, count the number of preceding targets and decoys based on the Decoy column.
Calculate FDR for peptide i as numDecoys/numTargets.

Based on this, I got 5,557 target peptides (4,649 unique) at FDR 1%. In the default peptide report from PeptideShaker, I get 5,144 peptides (4,625 unique). The overlap of the 2 peptide lists is not 100%, there are 25 peptides that are only in the PeptideShaker output at FDR 1% and missing in my list.

These results are more or less similar but I would like to check if there's anywhere I may have gone wrong so that I may get the same numbers in the PeptideShaker reports. I have attached a copy of my algorithm score report here.

NT029_TEAB_Algotithm_Raw_Score_Report.txt

mvaudel commented 2 years ago

Hi, Thanks for the details, this looks all good as far as I can tell. I think that the report you are looking at contains all the results from X!Tandem, PeptideShaker keeps only one peptide per spectrum, this is most likely the main difference. Hope it helps, Marc

compomics / peptide-shaker

Xcorr score for PSMs using ReportCLI #463