Discussion: Cancer groups in V22 figures

sjspielman commented 2 years ago

Tagging @jharenza @jashapiro @jaclyn-taroni for discussion.

With V22, we now have different numbers of samples across certain cancer groups. The purpose of this issue is to discuss a unified strategy for ensuring that figures are roughly consistent with which cancer groups they display, which may benefit from some hard-coding (see here: https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1453#discussion_r904046412).

These figures include -

3 A/B interaction plots, which has been updated in #1453 to use the following cancer groups (total N=13).-

"Diffuse midline glioma",
"Other high-grade glioma",
"Pilocytic astrocytoma",
"Ganglioglioma",
"Pleomorphic xanthoastrocytoma",
"Other low-grade glioma",
"Medulloblastoma",
"Atypical Teratoid Rhabdoid Tumor",
"Other embryonal tumor",
"Ependymoma",
"Craniopharyngioma",
"Meningioma",
"Other"

3E (mutational signatures jitter), S4a (mutational signatures barplot), S4b, 4D (tp53 and telomerase scores), were originally created to match the cancer groups in the interaction plot, but this has changed! For example in the mutational signatures notebook (noting some of these are now deprecated): https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/d6749d080eae20100cedef2f0ed35429153e7ea5/analyses/mutational-signatures/07-plot_cns_fit.Rmd#L88-L104

5C immune cell types which programmatically plots all cancer groups with N>=15:

cg_of_interest <- palette_mapping_df %>%
filter(!is.na(cancer_group)) %>%
count(cancer_group_display) %>%
filter(n >= 15) %>%
pull(cancer_group_display)

The easiest solution here is probably just to use the array of cancer groups exactly as defined in the interaction plot script and hardcode the groups of interest. However, most of these plots are faceted and N=13 panels will not look great faceted, but the data is what the data is I suppose!

Any thoughts?

jashapiro commented 2 years ago

If we leave off Other, which seems reasonable enough for at least some cases, then N=12, which can be plotted reasonably. In previous plots for mutation sigs we had not included other.

sjspielman commented 2 years ago

If we leave off Other, which seems reasonable enough for at least some cases, then N=12, which can be plotted reasonably. In previous plots for mutation sigs we had not included other.

This is what I was maybe thinking too.

jharenza commented 2 years ago

Agree with leaving out "Other" from non-interaction plots, but for:

5C immune cell types which programmatically plots all cancer groups with N>=15:

We will want to modify the groups to at least include the ones discussed in the manuscript (NF and schwannoma are not included above but had the most striking immune deconv to go along with biological discussion of results)

sjspielman commented 2 years ago

We will want to modify the groups to at least include the ones discussed in the manuscript (NF and schwannoma are not included above but had the most striking immune deconv to go along with biological discussion of results)

@jharenza I'm filing the immune-deconv re-run now, so we can figure this out precisely in #1458 next on deck!

Edit!! When I was looking at the diffs for filing the module re-run, I saw there aren't any! I ran it again to be sure, and indeed no diffs. It makes sense that the deconvolution results do not change with V22 since the result files don't store metadata along with the deconvolution results. Therefore, I wonder if it makes sense for me to just update the figures straight away in this branch I have going: https://github.com/sjspielman/OpenPBTA-analysis/tree/immune-deconv-v22

We just need to formally decide how to select cancer groups for this.

jaclyn-taroni commented 2 years ago

I am assuming that this can be closed, as it was probably handled in the individual figure reruns. I'm not sure though, so I'm going to leave it open for now.

jharenza commented 2 years ago

I also think this can be closed as completed!

AlexsLemonade / OpenPBTA-analysis

Discussion: Cancer groups in V22 figures #1474