AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
99 stars 66 forks source link

Split up sample distribution representation between figures and tables, main text and supplement #1175

Closed jaclyn-taroni closed 2 years ago

jaclyn-taroni commented 3 years ago

Context

Currently we're using a multilayer sunburst plot to represent (from inner to outer layer) the following data:

  1. broad_histology
  2. cancer_group
  3. Type of DNA-seq
  4. Presence or absence of RNA-seq
  5. tumor_descriptor
  6. germline_sex_estimate

Here's a screen shot:

Screen Shot 2021-09-09 at 12 41 39 PM

(There is no legend here; we intended on making one and that would certainly be an improvement but would not totally ameliorate my concerns.)

As expressed in a Google Slide comment, I am concerned that this is too busy to convey what we're hoping to convey, i.e., there is no getting a quick sense of the relevant categories for the cohort from this figure, particularly when the sample size is small.

Ideas

I think this is a case where we should try to rely very heavily on labels for the main text display and split out information we can not successfully visualize into a supplemental table.

Figure

I think we should have a stacked bar plot where each bar represents a cancer_group, the stacked fill represents whether a sample has DNA-seq, RNA-seq, or both (or whatever is deemed most important), and cancer_group are grouped by broad_histology(perhaps using facets)

See my sketch below:

Image from iOS (1)

Supplemental table

We can then create a table that tells readers about the other variables – tumor_descriptor and germline_sex_estimate. (It's not obvious to me that this information needs to be coupled with the assays info or that we need to communicate the interaction between these variables; curious readers will have access to the histologies file.)

To get counts (and %), we'd group by cancer_group and then sort by broad_histology and the columns would be:

broad histology cancer group tumor descriptor germline sex estimate

Next steps

This should be implemented in analyses/sample-distribution-analysis which I am happy to do myself, provided folks are on board.

jharenza commented 2 years ago

hi @jaclyn-taroni - this looks fine. How about also adding a small pie chart of the tumor descriptors in a panel as well? I think that information is more valuable than sex. We can highlight in text that we have a lot of diagnostic high-grade tumors, which traditionally were not biopsied.

sjspielman commented 2 years ago

Coming in to pick this back up.

Some concepts to explore for this plot:

sjspielman commented 2 years ago

Close this issue in favor of a new issue to track subsequent tables and supplementary figures.