Closed jaclyn-taroni closed 2 years ago
hi @jaclyn-taroni - this looks fine. How about also adding a small pie chart of the tumor descriptors in a panel as well? I think that information is more valuable than sex. We can highlight in text that we have a lot of diagnostic high-grade tumors, which traditionally were not biopsied.
Coming in to pick this back up.
Some concepts to explore for this plot:
Close this issue in favor of a new issue to track subsequent tables and supplementary figures.
Context
Currently we're using a multilayer sunburst plot to represent (from inner to outer layer) the following data:
broad_histology
cancer_group
tumor_descriptor
germline_sex_estimate
Here's a screen shot:
(There is no legend here; we intended on making one and that would certainly be an improvement but would not totally ameliorate my concerns.)
As expressed in a Google Slide comment, I am concerned that this is too busy to convey what we're hoping to convey, i.e., there is no getting a quick sense of the relevant categories for the cohort from this figure, particularly when the sample size is small.
Ideas
I think this is a case where we should try to rely very heavily on labels for the main text display and split out information we can not successfully visualize into a supplemental table.
Figure
I think we should have a stacked bar plot where each bar represents a
cancer_group
, the stacked fill represents whether a sample has DNA-seq, RNA-seq, or both (or whatever is deemed most important), andcancer_group
are grouped bybroad_histology
(perhaps using facets)See my sketch below:
Supplemental table
We can then create a table that tells readers about the other variables –
tumor_descriptor
andgermline_sex_estimate
. (It's not obvious to me that this information needs to be coupled with the assays info or that we need to communicate the interaction between these variables; curious readers will have access to the histologies file.)To get counts (and %), we'd group by
cancer_group
and then sort bybroad_histology
and the columns would be:Next steps
This should be implemented in
analyses/sample-distribution-analysis
which I am happy to do myself, provided folks are on board.