Closed nattzy94 closed 3 years ago
Hi,
Thank you for contacting us about this. I am afraid we don't have an easy solution at the moment, apologies for this and the lack of documentation. Below are ways to extract the data that you need and considerations on scores.
Getting XCorr/p-value/e-value We currently do not compute these, we just take the output of a search engine. You can therefore only export the score that has been used to rank peptides - typically the p-value provided by the search engine. For your specific question, I would recommend extracting the score of interest from the search engine results directly in the environment that you use for data interpretation (bioconductor should have parsers for R, and pyteomics for python).
Extend reports to include the score
If you want to create a new report. The easiest is to do it via the user interface of the tool via the Export -> Identification Features menu, and then click "Add new report type". The feature you are looking for is called "Algorithm Raw Score" under "Identification Algorithm Results". This will export the score/p-value/e-value provided by the algorithm. Note that what the score represents is dependent on the search engine.
Once saved, you can use your custom report from the ReportCLI. If you want to use this report on another computer, your report is saved under ~/.peptideshaker/exportFactory.json
, you can pass this file to the ReportCLI using the -peptideshaker_exports
command. I am sorry that this is so convoluted and poorly documented, we are working on improving the portability of reports.
Score dependence on the database By definition the match between a peptide sequence + mods and a spectrum, e.g. a correlation, does not depend on the database. Increasing the size of the database will however increase the likelihood that an incorrect match scores close, equally, or even better than the correct sequence. How well the match between your peptide and spectrum fares compared to all other peptides, which is typically what p/e-values attempt to capture, thus depends on the size of the database. We looked into how much of a difference this makes in a review a while ago, see figure 4 in https://doi.org/10.1002/mas.21543, hopefully it will be helpful.
Thanks for the thorough reply and helpful links. I am learning a lot from you guys.
Once saved, you can use your custom report from the ReportCLI. If you want to use this report on another computer, your report is saved under ~/.peptideshaker/exportFactory.json, you can pass this file to the ReportCLI using the -peptideshaker_exports command
What I get from this is that I can export a custom report containing the search engine score using the GUI. And then use the resulting ~/.peptideshaker/exportFactory.json
file as a template to generate the same custom report for other searches?
What I get from this is that I can export a custom report containing the search engine score using the GUI. And then use the resulting ~/.peptideshaker/exportFactory.json file as a template to generate the same custom report for other searches?
Correct. You create and save the custom report via the GUI and then reuse the exportFactory.json file when wanting to make the same reports via the command line.
To see the report numbers of your custom reports, simply run the ReportCLI command line without any input options.
Hi,
I tried to extend the reports as you described and managed to get the custom report I wanted. However, I do not have a exportFactory.json
. I only have exportFactory.cus
file in ~/.peptideshaker
folder (I am using macOS). Are these equivalent?
I would also like to obtain the computed FDR for each peptide. Is its possible to do so by extending the reports as previously described?
Hi,
I am afraid the exportFactory.cus
is an older version, you would need the .json
file. Please make sure that you are using the latest version of PeptideShaker. Note that if you are using the temp folder version, the files will be saved in the temp folder.
I am afraid we don't provide the FDR at each score, but it can be easily computed, either by counting the decoys above a given score or by summing the PEP (ie 1-confidence).
Hope it helps,
Marc
Hi,
I managed to get the algorithm score report by extending the score reports as previously suggeseted. I would like to check if my method to obtain the FDR is correct. This is as follows:
Decoy
column.Based on this, I got 5,557 target peptides (4,649 unique) at FDR 1%. In the default peptide report from PeptideShaker, I get 5,144 peptides (4,625 unique). The overlap of the 2 peptide lists is not 100%, there are 25 peptides that are only in the PeptideShaker output at FDR 1% and missing in my list.
These results are more or less similar but I would like to check if there's anywhere I may have gone wrong so that I may get the same numbers in the PeptideShaker reports. I have attached a copy of my algorithm score report here.
Hi, Thanks for the details, this looks all good as far as I can tell. I think that the report you are looking at contains all the results from X!Tandem, PeptideShaker keeps only one peptide per spectrum, this is most likely the main difference. Hope it helps, Marc
Hi,
Is it possible to include the XCorr score in the PSM report generated by ReportCLI? I can't seem to find a way. Alternatively, other scores like PSM p-value or e-value would also work. I would like to compare if PSMs may get higher scores depending on the protein database in use.