Closed sjspielman closed 2 years ago
However, I also had an idea that these can be y-faceted by score, and we can align by TP53 score(?).
I'm not sure what you mean "faceted by score." Score is a continuous variable, and I do not see any reason to discretize it.
In terms of the "other" tp53 samples, we can keep them in to show the distribution, but I am really uncomfortable comparing the distributions with the classified activated vs loss categories with a formal stats test, because I still don't see that there is no clear hypothesis about what the "other" samples are. Not having certain mutations (biased by what mutations have been studied already that we have access to), in my mind, is absence of evidence for any hypothesis. @jaclyn-taroni, do you have any thoughts?
Should this be median +/sd?
If anything else probably should do mean + IQR, since sd tends to "go with" mean and median is more appropriate for a nonparametric. Can update this.
I'm not sure what you mean "faceted by score." Score is a continuous variable, and I do not see any reason to discretize it.
Oh I just meant having the facets on the y axis with the scores still continuous but have the plots paneled by row, like in the sketch below. Top panel being TP53 scores, bottom being EXTEND; and they share the x-axis cancer groups with groups being ordered by TP53 score median as you have them currently.
With the faceting they'd end up with a shared axis, so we'll also see how it looks when the full figure PDF is compiled for whether separate or shared panel labels looks better!
I've just pushed some changes:
"Other" is now included in violin plots, but it's not included in the statistical tests. The p-value labels were moved closer to lost/activated groupings to hopefully emphasize this. But I can see how this could be confusing.
Can you add a line between activated/lost showing that the p-value comparison goes to those groups? Maybe we do this in illustrator @jaclyn-taroni ?
Now as a single figure faceted vertically by scores where x-axis is in tp53 order, and we're using the mutator colors. If we want to use cancer group colors, then this place has to go back to being 2 separate vertical panels. What do we think?
I like this because you can start to see trends that we saw in the correlation plots - some groups have high TP53 and high telomerase scores, but others (meningioma) have the opposite trend.
I have "Telomerase score" as a label - do we prefer "Normalized EXTEND scores?"
Commented on #1283 that I think either "Telomerase score" or "Telomerase score (EXTEND)" is good
Updated with stat_pvalue_manual()
, removed old legend file, and updated expression violin plots to show log(fpkm+1)
which is also now reflected in the axis title.
P-values look good!
One more thing- can we print out N, R, and p-values (and adjusted p when necessary), or add N to the x-axis in parenthesis for each group, for the respective plots for the manuscript legends? I think this info was previously in notebooks and/or tsv files.
One more thing- can we print out N, R, and p-values (and adjusted p when necessary), or add N to the x-axis in parenthesis for each group, for the respective plots for the manuscript legends? I think this info was previously in notebooks and/or tsv files.
Which plots are you referring to? We don't have any correlations in these plots. Do you mean adding N to the tp53 violin plots? I can definitely label those x-axes!
One more thing- can we print out N, R, and p-values (and adjusted p when necessary), or add N to the x-axis in parenthesis for each group, for the respective plots for the manuscript legends? I think this info was previously in notebooks and/or tsv files.
Which plots are you referring to? We don't have any correlations in these plots. Do you mean adding N to the tp53 violin plots? I can definitely label those x-axes!
Oh yes was generalizing this with the telomerase TERT/TERC plots. I'm good whichever way- whether on plots, printed in notebook, or exported table.
Oh yes was generalizing this with the telomerase TERT/TERC plots.
Ok, these are over in PR #1283 and already have R
labels. For this PR, I'll add sample sizes to the violin plot x-axis labels.
@jharenza Are the labels I added here what you had in mind? I added N=
info to x-axes for violin plots and to the mutation status legend.
@jharenza Are the labels I added here what you had in mind? I added
N=
info to x-axes for violin plots and to the mutation status legend.
yes, looks great!
This PR partially addresses Issue #1272 and creates a new script
fig4-tp53-telomerase.R
for populating figure 4 panels.fig4-tp53-telomerase.R
produces the following figures:figures/pdfs/fig4/panels/tp53_stranded_roc_panel.pdf
figures/pdfs/fig4/panels/tp53_scores_by_altered_panel.pdf
figures/pdfs/fig4/panels/tp53_expression_by_altered_panel.pdf
figures/pdfs/fig4/panels/tp53_scores_boxplot_panel.pdf
andfigures/pdfs/fig4/panels/tp53_scores_boxplot_legend.pdf
figures/pdfs/fig4/panels/telomerase_scores_boxplot_panel.pdf
Notable changes:
fig4-tp53-panel.R
has been removed and code has been integrated intofig4-tp53-telomerase.R
. Those output PDFs (panel D) have also been renamed for clarity since there are more tp53 panels nowfig4-telomerase-activities.R
, and the original PDF figure from that script has been removed.Places for reviewers
Documentation Checklist
README
and it is up to date.