Closed jpfeuffer closed 2 years ago
Hi Julianus! @jpfeuffer An example is here: pmultiqc/multiqc_report.html
Comet xcorr
or MSGF SpecEvalue
(the mean of each file)
PEPs
for both search engines (the mean of all the files)
Question:
Barplot on number of matching search engine IDs per PSM
I haven't found any PSM quantitative information in idXMLs yet. Does this need to be counted manually? If so, which part of information should be counted, for example, the frequency of sequence in PeptideHits?
Hi! Thanks a lot, that is a good start.
1) How hard would it be to do a Histogram, as we did for most of the other diagrams. I think the mean is too uninformative. Maybe we could have the files for selection up there (where Comet and MSGF are now). And then have different plots for the search engines.
2) Same comment.
3) Here I meant more the agreement over different search engines. This could be done by checking the "support" metavalue in the idXML after consensusID. Again, a histogram would be better.
Got it, I will also give an example when I finish it
Hi Julianus! One more thing to confirm is whether you want to use the Histogram class to quantify and plot different ranges of search scores and PEPs, since its current function is to quantify and plot data from specific ranges or values. In addition, ploting the search score or PEP for each PSM results in flat images
where toolbox functions are disabled.
Hi! Yes, the histogram should support arbitrary ranges with start value, end value and number of bins. We then have to find a good range for the search scores. Flat imageare ok for now.
A new example is here: pmultiqc/multiqc_report_6.html
Three sections as follow:
So far I haven't found any examples of using multiQC to draw histograms, only bar graphs; The current Histogram class is also used to draw bar graphs.
The range of search scores and PEPs is temporarily start=0, end=1, step=0.2
Hi. I think a histogram is basically a bar graph. So that's fine. I would just use much more bins (around 100?). That means just decrease the step size. And maybe we can reduce the space between the bars a little bit. Such that they are bit closer to each other. Maybe that happens automatically if we increase the number of bins.
Hi Julianus! The step size I adopted was 0.02, because 100 bars would lead to flat images, and they could not reflect the quantity information. Then I stacked the bars so that they looked closer together. An example: pmultiqc/multiqc_report.html
Three sections as follow:
Yes this is better. I think stacking is fine. But I think you need to adapt start and end for Xcorr and SpecEvalue. For SpecEvalue probably -log10(SpecEvalue) is better as a value. They can have much broader ranges than 0-1.
Got it.
Hi Julianus! A new Example is here: report
In this section, I use -lg(SpecEvalue) for MSGF+ and |xcorr| for Comet. The range of -lg(SpecEvalue) is start=-1, end=inf, step=0.1
. The new bar plot as follow:
Do I need to remove the first few empty bars (SpecEvalue >= 10) in the plot like this:
Nice. But I kind of thought about adapting the end in the sense that the bins would then cover a larger range. What is the maximum value of the respective scores? It should be much higher. It does not make sense to show a huge bar for the last bin.
Xcorr has a range greater than 0, and an example of his histogram is as follows:
I use the range start=0, end=5, step=0.1
and the result is as follows:
SpecEvalue has a range of 0 to 1, and its negative logarithm has a range of 0 to infinity. I use the range start=0, end=inf, step=0.4
, and the results are as follows:
Hi! That looks fine for a start. Maybe we have to adjust the ranges later, after we saw some more data. You have to make sure that it is very easily edited. Maybe with a global constant variable in the module.
e.g. XCORR_HIST_RANGE = (x,y)
Other than that, can you upload a full report somewhere? So I can double-check the whole thing?
@WangHong007 I think you can make a proper PR that includes the examples of generation of the reports with the new changes.
New example is here: report
A global constant variable will be added. BTW, the Histogram
class need to be modified to adapt these changes.
@ypriverol Got it.
Yes no problem with the Histogram class. As long as the old things still work and the class does not become too complicated.
By the way, by which logic did you do the consensus PSMs?
I think it is better if we call it: Number of agreeing search engines per PSM
And then just list the numbers : 1 or 2 (or in the future 3 or 4)
I can see it when you open the PR
Based on idXMLs.
Histograms per search engine:
Histogram per file (e.g. dropdown menu) or histogram after merging
Either: Barplot on number of matching search engine IDs per PSM Or: Scatterplot of number of matching search engine IDs per PSM versus the best PEP/PP score