Closed cansavvy closed 4 years ago
Tagging @sjspielman to weigh in about color palette.
A strategy like this was suggested to me for color palette. Use an existing smaller color palette that is colorblind friendly and reasonable to distinguish, and then have different gradients. This would work out well if we have some kind of higher level grouping for subtypes? https://stackoverflow.com/questions/50163072/different-colors-with-gradient-for-subgroups-on-a-treemap-ggplot2-r/50164882#50164882
Wow, that S.O. post makes me want to consider reworking our treemaps in sample-distribution-analysis
, too! (Related to publication-ready figures: #571)
Okay, this is not exactly what we want, but it was something I did in the past (and mentioned before in person) to generate a lot of colors that are pretty distinguishable. This example has 49 colors, which is definitely pushing it. It could be a place to start.
colorscheme = hsv(h = 1:49/49 * .85, v = c(.8,1,1), s = c(1,1, .6))
Ooh, I just found: http://phrogz.net/css/distinct-colors.html
Which allowed me to generate this set:
Not perfect, but not bad... Dropping some of the dark colors would help, I expect
@dvenprasad and I chatted a bit about color palettes. Here are the colorsets I believe we need:
1) Color palette for each histology group in short_histology
.
2) A gradient color scale (for things like TMB).
3) A divergent color scale (for things like seg.means) ...
4) A binary color key (for things like CN status). The most extreme colors in the divergent color scale can be used for this binary color key.
@dvenprasad also found these two tools that we can use to poke around: https://www.colorbox.io/ https://projects.susielu.com/viz-palette
1) I'm going to attempt to pick a color palette for items 1 - 4 using the tools listed above and also the suggestions that have been placed on this issue. 2) I will test them for colorblind friendliness with Color Oracle. 3) I'll file a draft PR with suggested color palettes and options. 4) I'll try to create R color key objects that we can use to apply to all plots and with instructions of how to apply them to our plots and put the HEX codes in table that can live in a README (not sure which README Is appropriate).
With #622 merged, we are ready to update figures to the unified color palette (See the README in figures
for instructions). If there are any changes that need to be made to the color palette as we are starting to implement, you can note them here and I can help with that.
In #622, colors are defined for short_histology
, but not for other histology definitions. I propose we have colors defined for broad_histology
as well and a table that assigns the same colors (or slightly modified versions) to integrated_diagnosis
. Unfortunately, it appears that short_histology
does not neatly nest in broad_histology
, which could make this a challenge.
I am also unsure of the difference between na_color
and Other
, but from examining the results in #633, it appears that Other
should probably be colored the same (or a similarly neutral grey) to na_color
in that the Other
short_histology includes tumors of quite varied types. With its current prominent reddish color, this "category" seems to indicate meaning where none is likely to exist.
As a minor clarification, I propose that the histology_color_palette.tsv
use short_histology
rather than color_names
as the column header. (I also like singular column headers, but that is super minor and probably too late to change!).
What I might like to see is something like the following for histology_color_palette
, but I fear this is not possible, given constraints above.
integrated_diagnosis | short_histology | broad_histology | integrated_diagnosis_color | short_histology_color | broad_histology_color |
---|---|---|---|---|---|
Atypical Teratoid Rhabdoid Tumor | ATRT | Embryonal tumor | |||
Medulloblastoma | Medulloblastoma | Embryonal tumor |
Note: I will be filing a separate data issue about the short_histology
labels, specifically Other
which seems to include benign and metastatic tumors, as well as other broad histologies that do not seem like they should be collapsed for any reasonable grouping.
I am also unsure of the difference between
na_color
andOther
, but from examining the results in #633, it appears thatOther
should probably be colored the same (or a similarly neutral grey) tona_color
in that theOther
short_histology includes tumors of quite varied types. With its current prominent reddish color, this "category" seems to indicate meaning where none is likely to exist.
Yes, I wasn't sure how Other
vs no assignment for short_histology
were being assigned, so I didn't want to merge that and lose the information, so I left it as is.
I am also unsure of the difference between
na_color
andOther
, but from examining the results in #633, it appears thatOther
should probably be colored the same (or a similarly neutral grey) tona_color
in that theOther
short_histology includes tumors of quite varied types. With its current prominent reddish color, this "category" seems to indicate meaning where none is likely to exist.Yes, I wasn't sure how
Other
vs no assignment forshort_histology
were being assigned, so I didn't want to merge that and lose the information, so I left it as is.
Looks like NA is exclusively "non-tumor", which makes sense. But "Other" is a mix of unrelated things, as discussed in #647
The oncoprint landscape plots in the oncoprint-landscape
module of this repository currently implement a color palette sourced from https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/oncoprint-landscape/util/oncoplot-palette.R.
This color palette contains hex codes for unique categories of SNVs, CNVs, and fusion data.
It is being implemented in the PR getting the oncoprint landscape figure publication ready (WIP PR #666), and as @cansavvy noted in a review comment, it should probably be adjusted (to be uniformed) and incorporated into the color palette strategy.
We now have unified color palettes and their usage is documented here: https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/figures#color-palette-usage
The majority of figures in figures/png
use these palettes. I am going to close this issue in favor of more focused issues for individual figures as it comes up.
What analysis module should be updated and why?
All modules that have plots with colors.
We should probably prioritize plots that will be in the main document? But we probably want the unified color palette to also extend to non-main figures.
What changes need to be made? Please provide enough detail for another participant to make the update.
We should have a unified color palette. This helps interpretability and aesthetics.
simplecolors
R package has some helpful tools and nice vignette: https://cran.r-project.org/web/packages/simplecolors/vignettes/intro.htmlFor
ggplot2
plots, colors can be designated using scale_fill_manual and scale_color manual.Which colors do we generally want to default to?
short_histology
, so having the colors for each group in particular would help readers follow along better. Can use an numeric approach to try to get ~36 colors as different as possible. I started implementing thiscolorblindr
's palette selection for guidance on some variable color choices. Here's an example of what I mean, but I haven't yet tested these colors:Translate into colors
col_key <- hsv(h = col_val, s = col_val, v = 1)
Make this named based on histology
names(col_key) <- unique(df$short_histology)
Make the same order as the data.frame
col_key <- as.character(dplyr::recode(df$short_histology, !!!col_key))
Make the names
names(col_key) <- as.character(rownames(df))
col_fun <- circlize::colorRamp2( c(0, .25, .5, 1, 3), c("#edf8fb", "#b2e2e2", "#66c2a4", "#2ca25f", "#006d2c") )