Split up sample distribution representation between figures and tables, main text and supplement

jaclyn-taroni commented 3 years ago

Context

Currently we're using a multilayer sunburst plot to represent (from inner to outer layer) the following data:

broad_histology
cancer_group
Type of DNA-seq
Presence or absence of RNA-seq
tumor_descriptor
germline_sex_estimate

Here's a screen shot:

(There is no legend here; we intended on making one and that would certainly be an improvement but would not totally ameliorate my concerns.)

As expressed in a Google Slide comment, I am concerned that this is too busy to convey what we're hoping to convey, i.e., there is no getting a quick sense of the relevant categories for the cohort from this figure, particularly when the sample size is small.

Ideas

I think this is a case where we should try to rely very heavily on labels for the main text display and split out information we can not successfully visualize into a supplemental table.

Figure

I think we should have a stacked bar plot where each bar represents a cancer_group, the stacked fill represents whether a sample has DNA-seq, RNA-seq, or both (or whatever is deemed most important), and cancer_group are grouped by broad_histology(perhaps using facets)

See my sketch below:

Image from iOS (1)

Supplemental table

We can then create a table that tells readers about the other variables – tumor_descriptor and germline_sex_estimate. (It's not obvious to me that this information needs to be coupled with the assays info or that we need to communicate the interaction between these variables; curious readers will have access to the histologies file.)

To get counts (and %), we'd group by cancer_group and then sort by broad_histology and the columns would be:

broad histology	cancer group	tumor descriptor	germline sex estimate

Next steps

This should be implemented in analyses/sample-distribution-analysis which I am happy to do myself, provided folks are on board.

jharenza commented 2 years ago

hi @jaclyn-taroni - this looks fine. How about also adding a small pie chart of the tumor descriptors in a panel as well? I think that information is more valuable than sex. We can highlight in text that we have a lot of diagnostic high-grade tumors, which traditionally were not biopsied.

sjspielman commented 2 years ago

Coming in to pick this back up.

Some concepts to explore for this plot:

Horizontal barplots with wrapped (i.e. 2-line) labels may ameliorate excessively large labeling that won't fit in main text.
- Can also identify suitable abbreviations for cancer groups if still too large
Tumor descriptor information should be prioritized in main text (per conversations w/ @jaclyn-taroni) and other characteristics in the sunburst (sex, experimental strategy, etc.) can go to SI, or be included in SI as a table.
- Need to explore using counts vs proportion.

sjspielman commented 2 years ago

Close this issue in favor of a new issue to track subsequent tables and supplementary figures.

AlexsLemonade / OpenPBTA-analysis