AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
100 stars 67 forks source link

Updated analysis: Unified color palette for plots #510

Closed cansavvy closed 4 years ago

cansavvy commented 4 years ago

What analysis module should be updated and why?

All modules that have plots with colors.

We should probably prioritize plots that will be in the main document? But we probably want the unified color palette to also extend to non-main figures.

What changes need to be made? Please provide enough detail for another participant to make the update.

We should have a unified color palette. This helps interpretability and aesthetics. simplecolors R package has some helpful tools and nice vignette: https://cran.r-project.org/web/packages/simplecolors/vignettes/intro.html

For ggplot2 plots, colors can be designated using scale_fill_manual and scale_color manual.

Which colors do we generally want to default to?

Translate into colors

col_key <- hsv(h = col_val, s = col_val, v = 1)

Make this named based on histology

names(col_key) <- unique(df$short_histology)

Make the same order as the data.frame

col_key <- as.character(dplyr::recode(df$short_histology, !!!col_key))

Make the names

names(col_key) <- as.character(rownames(df))


- For heatmaps or other continuous numeric variable data, we should choose a general color palette. Some instances want color functions, so we can use `colorRamp` for these instances, but we should decide what hex codes/colors should be used (I'm not suggesting necessarily the ones I have below). 

col_fun <- circlize::colorRamp2( c(0, .25, .5, 1, 3), c("#edf8fb", "#b2e2e2", "#66c2a4", "#2ca25f", "#006d2c") )


### Modules with plots that will need to be color palette unified: 
I've tagged myself on the modules I will be responsible for updating the palette for, others can add themselves for other modules. 

| Module | Person who will update the plots | Plots in this module to be updated? |
|--------|----------------------------------|----------------------|
| <ul><li>- [ ] chromosomal-instability | @cansavvy  | `breaks_cdf_plot.png`, 3 heatmaps, tumor-type plots|
| <ul><li>- [ ] cnv-chrom-plot | @cansavvy  | gistic.png and histology group plots|
| <ul><li>- [ ] cnv-comparison |  |                                 |
| <ul><li>- [ ] focal-cn-file-preparation |   |                                 |
| <ul><li>- [ ] immune-deconv |   |                                 |
| <ul><li>- [ ] interaction-plots  |   |                                 |
| <ul><li>- [ ] molecular-subtyping-ATRT |   |                                 |
| <ul><li>- [ ] mutational-signatures  | @cansavvy  | The bubble matrix plots, all `cosmic/` and `nature/` plots, individual and grouped barplots |
| <ul><li>- [ ] oncoprint-landscape  | @cbethell   |  The 4 oncoprint plots (`all_participants_ `  png plots)  |
| <ul><li>- [ ] sample-distribution-analysis  |  @cbethell |                                 |
| <ul><li>- [ ] selection-strategy-comparison |   |                                 |
| <ul><li>- [ ] sex-prediction-from-RNASeq |   |                                 |
| <ul><li>- [ ] snv-callers  | @cansavvy  |  All comparison plots   |
| <ul><li>- [ ] ssgsea-hallmark|   |                                 |
| <ul><li>- [ ] survival-analysis | @cansavvy  |  `survival_curve_gender.pdf`  |
| <ul><li>- [ ]  tmb-compare-tcga | @cansavvy  | Main TMB compare plot |
| <ul><li>- [ ] tp53_nf1_score |   |                                 |
| <ul><li>- [ ] transcriptomic-dimension-reduction |  |                                 |

#### When do you expect the revised analysis will be completed?
? We should also better refine which plots are the priority before we can make this call. 
jaclyn-taroni commented 4 years ago

Tagging @sjspielman to weigh in about color palette.

sjspielman commented 4 years ago

A strategy like this was suggested to me for color palette. Use an existing smaller color palette that is colorblind friendly and reasonable to distinguish, and then have different gradients. This would work out well if we have some kind of higher level grouping for subtypes? https://stackoverflow.com/questions/50163072/different-colors-with-gradient-for-subgroups-on-a-treemap-ggplot2-r/50164882#50164882

jaclyn-taroni commented 4 years ago

Wow, that S.O. post makes me want to consider reworking our treemaps in sample-distribution-analysis, too! (Related to publication-ready figures: #571)

jashapiro commented 4 years ago

Okay, this is not exactly what we want, but it was something I did in the past (and mentioned before in person) to generate a lot of colors that are pretty distinguishable. This example has 49 colors, which is definitely pushing it. It could be a place to start.

colorscheme = hsv(h = 1:49/49 * .85, v = c(.8,1,1), s = c(1,1, .6))

image

jashapiro commented 4 years ago

Ooh, I just found: http://phrogz.net/css/distinct-colors.html

Which allowed me to generate this set:

Screenshot 2020-03-03 13 53 27

400000, #ffaa00, #bfffd0, #3370cc, #bf0099, #bf3030, #8c5e00, #005924, #0030b3, #731d4b, #d9a3a3, #f2da79, #2d5950, #001140, #ff0066, #ff4400, #8c8569, #3df2e6, #2b2633, #594943, #474d00, #00ccff, #a200f2, #bf6930, #cef23d, #b6def2, #796080, #331c0d, #00b330, #2d4459, #ffbffb

Not perfect, but not bad... Dropping some of the dark colors would help, I expect

cansavvy commented 4 years ago

@dvenprasad and I chatted a bit about color palettes. Here are the colorsets I believe we need:

1) Color palette for each histology group in short_histology. 2) A gradient color scale (for things like TMB). 3) A divergent color scale (for things like seg.means) ... 4) A binary color key (for things like CN status). The most extreme colors in the divergent color scale can be used for this binary color key.

@dvenprasad also found these two tools that we can use to poke around: https://www.colorbox.io/ https://projects.susielu.com/viz-palette

Next steps:

1) I'm going to attempt to pick a color palette for items 1 - 4 using the tools listed above and also the suggestions that have been placed on this issue. 2) I will test them for colorblind friendliness with Color Oracle. 3) I'll file a draft PR with suggested color palettes and options. 4) I'll try to create R color key objects that we can use to apply to all plots and with instructions of how to apply them to our plots and put the HEX codes in table that can live in a README (not sure which README Is appropriate).

cansavvy commented 4 years ago

With #622 merged, we are ready to update figures to the unified color palette (See the README in figures for instructions). If there are any changes that need to be made to the color palette as we are starting to implement, you can note them here and I can help with that.

jashapiro commented 4 years ago

In #622, colors are defined for short_histology, but not for other histology definitions. I propose we have colors defined for broad_histology as well and a table that assigns the same colors (or slightly modified versions) to integrated_diagnosis. Unfortunately, it appears that short_histology does not neatly nest in broad_histology, which could make this a challenge.

I am also unsure of the difference between na_color and Other, but from examining the results in #633, it appears that Other should probably be colored the same (or a similarly neutral grey) to na_color in that the Other short_histology includes tumors of quite varied types. With its current prominent reddish color, this "category" seems to indicate meaning where none is likely to exist.

As a minor clarification, I propose that the histology_color_palette.tsv use short_histology rather than color_names as the column header. (I also like singular column headers, but that is super minor and probably too late to change!).

What I might like to see is something like the following for histology_color_palette, but I fear this is not possible, given constraints above.

integrated_diagnosis short_histology broad_histology integrated_diagnosis_color short_histology_color broad_histology_color
Atypical Teratoid Rhabdoid Tumor ATRT Embryonal tumor
Medulloblastoma Medulloblastoma Embryonal tumor

Note: I will be filing a separate data issue about the short_histology labels, specifically Other which seems to include benign and metastatic tumors, as well as other broad histologies that do not seem like they should be collapsed for any reasonable grouping.

cansavvy commented 4 years ago

I am also unsure of the difference between na_color and Other, but from examining the results in #633, it appears that Other should probably be colored the same (or a similarly neutral grey) to na_color in that the Other short_histology includes tumors of quite varied types. With its current prominent reddish color, this "category" seems to indicate meaning where none is likely to exist.

Yes, I wasn't sure how Other vs no assignment for short_histology were being assigned, so I didn't want to merge that and lose the information, so I left it as is.

jashapiro commented 4 years ago

I am also unsure of the difference between na_color and Other, but from examining the results in #633, it appears that Other should probably be colored the same (or a similarly neutral grey) to na_color in that the Other short_histology includes tumors of quite varied types. With its current prominent reddish color, this "category" seems to indicate meaning where none is likely to exist.

Yes, I wasn't sure how Other vs no assignment for short_histology were being assigned, so I didn't want to merge that and lose the information, so I left it as is.

Looks like NA is exclusively "non-tumor", which makes sense. But "Other" is a mix of unrelated things, as discussed in #647

cbethell commented 4 years ago

The oncoprint landscape plots in the oncoprint-landscape module of this repository currently implement a color palette sourced from https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/oncoprint-landscape/util/oncoplot-palette.R.

This color palette contains hex codes for unique categories of SNVs, CNVs, and fusion data.

It is being implemented in the PR getting the oncoprint landscape figure publication ready (WIP PR #666), and as @cansavvy noted in a review comment, it should probably be adjusted (to be uniformed) and incorporated into the color palette strategy.

jaclyn-taroni commented 4 years ago

We now have unified color palettes and their usage is documented here: https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/figures#color-palette-usage

The majority of figures in figures/png use these palettes. I am going to close this issue in favor of more focused issues for individual figures as it comes up.