New sample distribution main & supplemental display items

AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project

Other

100 stars 67 forks source link

New sample distribution main & supplemental display items #1201

Closed jaclyn-taroni closed 2 years ago

jaclyn-taroni commented 3 years ago

⚠️ downstream of #1193 and #1195 + cherry-picked the ggpattern installation from #1197

Addresses part of #1175, specifically the main display item that would replace the multilayer sunburst plot. For each broad histology display group, the script added creates a two plot panel:

A stacked bar plot, where the fill is the cancer group hex code (so many cancer groups are grey) and the pattern represents whether a sample has been assayed with RNA-seq, DNA-seq, or both.
- The y limits are not the same for each panel, but the total number of samples is in text at the top of the bar.
A stacked bar plot that shows the proportion of each cancer type per tumor descriptor (e.g., Initial CNS tumor) using the tumor descriptor palette for the fill.

Examples

Diffuse astrocytic and oligodendroglial tumor two_panels

Embryonal tumor two_panels

And the legends

assay_pattern_legend

tumor_descriptor_legend

jharenza commented 3 years ago

@jaclyn-taroni I am not loving the patterns in the bars, but maybe we can simplify and use only panel 2 with the total N's per group in parentheses in the X-axis label or above the bar, and leave out the N by assay from this part of the figure?

jaclyn-taroni commented 2 years ago

Okay this is now split into the tumor descriptor plot (for main display):

And the stacked bar plot, just in case we want it for the supp:

Easier to take the code out once it's in than the other way around.

jaclyn-taroni commented 2 years ago

The sample counts between the two figures differ and I know why, but I think it's worth discussing which one is "correct" for the tumor descriptor plots. At this point, I will request @jashapiro for review.

jaclyn-taroni commented 2 years ago

While we are at it...

The sample counts between the two figures differ and I know why, but I think it's worth discussing which one is "correct" for the tumor descriptor plots.

The tumor descriptor plots count specimens. The experimental strategy plot counts sample ids, which is necessary because the pattern displays how a given sample was assayed. Should the tumor descriptor plot also count sample ids?

Tagging @jharenza and @jashapiro

jashapiro commented 2 years ago

While we are at it...

The sample counts between the two figures differ and I know why, but I think it's worth discussing which one is "correct" for the tumor descriptor plots.

The tumor descriptor plots count specimens. The experimental strategy plot counts sample ids, which is necessary because the pattern displays how a given sample was assayed. Should the tumor descriptor plot also count sample ids?

Tagging @jharenza and @jashapiro

I feel like sample ids is the more relevant count for the tumor descriptors, personally.

jaclyn-taroni commented 2 years ago

I feel like sample ids is the more relevant count for the tumor descriptors, personally.

That's what I was thinking, too. I can make that change when I address https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1201#discussion_r781324156

jaclyn-taroni commented 2 years ago

Okay, I've simplified things quite a bit with facet_wrap() for both plots! I suspect this is the kind of thing that required patchwork when I initially filed it with the per-broad histology panels that included the experimental strategy and tumor descriptor plots. Now things are quite different! But it still makes sense to save the pattern legend separately.

jaclyn-taroni commented 2 years ago

Filed #1217 to keep track of the remaining issue! I'm going to update it and merge it.