IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 102 forks source link

Colour palettes for ggplot2 #4806

Open dannyparsons opened 6 years ago

dannyparsons commented 6 years ago

We are still yet to implement colour palettes in the graphics system. When we do, ggplot2 now supports viridis colours https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html which seem to have useful advantages over others (and has ggplot2 approval it seems)

scale_colour_continuous()scale_col and scale_colour_gradient() are now controlled by global options ggplot2.continuous.colour and ggplot2.continuous.fill. These can be set to "gradient" (the default) or "viridis".

New scale_colour_viridis_c()/scale_fill_viridis_c() (continuous) and scale_colour_viridis_d()/scale_fill_viridis_d() (discrete) make it easy to use Viridis colour scales.

Ogik99 commented 5 years ago

@maxwellfundi any thoughts on this.

Ogik99 commented 5 years ago

The viridis colour scale will be applicable to the colour and fill aesthetics. I found the link below to be useful https://ggplot2.tidyverse.org/reference/scale_viridis.html

rdstern commented 3 years ago

This seems to be as far as we have got in the past 2.5 years. I suggest we need another tab on the plotting sub-dialogues. It is probably called Colour or Colour Scales?

This is complicated, but is now well described here in Chapter 11.

I suggest that whatever we do, this is an instance where there will be more to add, and hence being able to edit the script file will be useful.

Here is an example where I added a colour scale for boxplots:

last_graph <- ggplot2::ggplot(data=Data, mapping=ggplot2::aes(y=tmin, fill=month_abbr, x=make_factor(month_abbr))) + ggplot2::geom_boxplot() + theme_grey() + ggplot2::theme(axis.text.x=ggplot2::element_text(angle=90, hjust=1, vjust=0.5)) + ggplot2::xlab(NULL)+ scale_fill_viridis_d(option="E",direction=1,end=1,begin=0.5) image

I notice we have viridis in R-Instat, but this command appears to be a part of ggplot2.

I strongly suggest that doing something simple to start would be very useful. Initially this could be mainly discrete for the sort of boxplots as above, plus colours for line plots and points at different factor levels. Later continuous scales will be needed for maps.

What I couldn't find - not yet worth spending long on it - is good scales for circular data? The example above is an instance. In all the standard scales, December at the top end is an extremely different colour to January at the bottom. In our case December is close to January, so perhaps the scale could be similar? Having a pallete that is a similar colour at the 2 ends would be quite a neat way of emphasising circular in a linear graph and even more useful if we then did polar coordinates!

Scales are also discussed in issue #4313

lilyclements commented 3 years ago

I've had a look at the scale_colour_viridis and scale_fill_viridis functions and so have some comments The following also applies to scale_colour_viridis in the viridis package, and the colour argument in ggplot2:

does not change the fill colour

ggplot(diamonds, aes(price)) + geom_freqpoly(binwidth = 500, aes(fill = clarity)) + scale_fill_viridis(discrete = TRUE)


* If the `fill` argument in `ggplot2` is a factor, then you need to specify `discrete = TRUE` in `scale_fill_viridis`
* If the `fill` argument in `ggplot2` is numeric, then you need to specify `discrete = FALSE` in `scale_fill_viridis`
* `scale_fill_viridis` can only be specified once. This creates an issue if you have two `geom_`'s, where both have a `colour` specified, but only one is a factor. Note that this is an issue in `ggplot2`.
e.g.

In the following, two geoms are specified, each with a colour argument:

If both colours are factors, then it works

ggplot(mpg, aes(displ, cty)) + geom_point(aes(colour = class)) + geom_rug(aes(colour = factor(cyl))) + scale_colour_viridis(discrete = TRUE)

If one colour is a factor, and one colour is continuous, then there is an error

ggplot(mpg, aes(displ, cty)) + geom_point(aes(colour = class)) + geom_rug(aes(colour = cyl)) + scale_colour_viridis(discrete = TRUE)

If both colours are continuous then the code works

ggplot(mpg, aes(displ, cty)) + geom_point(aes(colour = cty)) + geom_rug(aes(colour = cyl)) + scale_colour_viridis(discrete = FALSE)



* There are more specific functions: `scale_fill_viridis_c` for continuous and `scale_fill_viridis_d` for discrete data. However, `scale_fill_viridis` works for both of these cases with the `discrete = TRUE/FALSE` argument specified
* There is also `scale_fill_viridis_b` for binned scales. I have not looked too much into this option, but can consider it further?
* There is a parameter in `scale_fill_viridis` to apply the changes to both `fill` and `colour`. This is `aesthetics = c("fill", "colour")` (this also exists in `scale_colour_viridis`).

A design idea, as well as the parameters and it's functions are given below. Note that the same parameters (and values) for `scale_fill_viridis` apply for `scale_colour_viridis`.

![image](https://user-images.githubusercontent.com/21180424/109304656-79add500-7834-11eb-9167-f4e4f8629959.png)

**Colour Palette:**
* This is the colour palette in viridis
* Parameter name: `option`
* R default: `"viridis"`
* Other options: `"magma"`, `"inferno"`, `"plasma"`, `"cividis"`

**Transparency**
* This is how transparent the colour should be
* Parameter name: `alpha`
* R-default: `1`
* Other options: any number between [0, 1]

**Begins**
* This is to do with the hue where the colour palette begins
* Parameter name: `begin`
* R-default: `0`
* Other options: any number between [0, 1]

**Ends**
* This is to do with the hue where the colour palette ends
* Parameter name: `end`
* R-default: `1`
* Other options: any number between [0, 1]
* (Side note: the parameter value for `begin` **can be** the same as, or greater than the parameter value specified for `end`).

**Reverse Order**
* This is to change the direction of the colour palette
* Parameter name: `direction`
* R-default: `1`
* Other options: `-1`. If it is `-1`, the colour palette order reverses

**Apply changes to colour scale**
* This is to apply the changes made to the fill to the colour scale as well
* Parameter name: `aesthetics`
* R-default: Not run
* If checked, run `aesthetics = c("fill", "colour")`

1. Ideally the `discrete` parameter would be specified automatically since if the function reads `discrete = TRUE` for a continuous variable, there is an error (and vice versa). I'm not sure how easy this is to implement?
2. I'm not sure where the option to apply the changes to both fill and colour should go, but this feels important. I think it could be confusing if it is offered in each group-box.
lilyclements commented 3 years ago

I also looked at scales for circular data. While I can't find a package which does it directly, we can do it with the scale_fill_hue function in the ggplot2 package. Provided that this function reads in the parameter and value h = c(0, 360), it looks fairly circular.

dodoma.1 <- dodoma %>% filter(rain > 0)   # I'm not on my R-Instat laptop, and the other elements have turned logical rather than numerical in this data set, so I'll just use rain and filter to values above 0 so the colours are more visible!
dodoma.1 <- dodoma.1 %>%
  mutate(month_abbr=fct_relevel(month_abbr, c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")))
ggplot2::ggplot(data=dodoma.1, mapping=ggplot2::aes(y=rain, fill=month_abbr, x=(month_abbr))) +
  ggplot2::geom_boxplot() +
  theme_grey() +
  ggplot2::theme(axis.text.x=ggplot2::element_text(angle=90, hjust=1, vjust=0.5)) +
  ggplot2::xlab(NULL) +
  ylim(c(0, 10)) + 
  scale_fill_hue(h = c(0, 360))

image

We can then play with the intensity of colour (c) and lightness (l) . For example:

ggplot2::ggplot(data=dodoma.1, mapping=ggplot2::aes(y=rain, fill=month_abbr, x=(month_abbr))) +
  ggplot2::geom_boxplot() +
  theme_grey() +
  ggplot2::theme(axis.text.x=ggplot2::element_text(angle=90, hjust=1, vjust=0.5)) +
  ggplot2::xlab(NULL) +
  ylim(c(0, 10)) + 
  scale_fill_hue(h = c(0, 360), c = 50, l = 40)

image

lilyclements commented 3 years ago

@dannyparsons I was wondering if I could have your opinion on something with this dialog.

Both scale_fill_viridis and scale_colour_viridis functions have a parameter discrete which takes values TRUE and FALSE. This all applies for the colour option, but I'll just explain by using the fill option:

If the fill variable in the ggplot2 code is a discrete variable, then in the scale_fill_viridis, you have to set discrete = TRUE otherwise there is an error.

@Ivanluv is setting up these options in the plots sdg, and is using the boxplot dialog to test it. Currently in the boxplot dialog there are two ways to add a fill variable into the ggplot2 model: a) add a second factor variable into the receiver on the main dialog; b) add/edit a layer in on the subdialog.

Route (a) means that the fill colour is discrete, and hence discrete = TRUE (however, note that this is a restriction by R-Instat as numeric variables are allowed to be a fill colour in R) Route (b) means that it can either be discrete or continuous.

I can see a few options here, and was wondering what was best:

  1. Is there a way for the class of the fill variable in the ggplot2 object to be detected and then R-Instat can automatically run this discrete parameter (I assume this is too time consuming for now?).
  2. Since route (a) means that the fill colour is always discrete, should we just fix discrete = TRUE and not have it as an option in the sdg for now?
  3. Should the discrete option be a manual checkbox on the sdg so it is down to the user to sort out, and if they get an error then so be it. (Default: discrete = TRUE)
dannyparsons commented 3 years ago

I don't think option 1. is too difficult and we should aim for that. We have something similar in ucrAxes though I don't think we have the actual variable detection being done. All we need is a simple R function to return "continuous" or "discrete" that is run when this loads.

There is some detail to get right in finding the right column to check, because we need to check the global aes() as well as the aes() in each of the layers to know whether it's being used. But I think we just need to find the first instance, because we can assume that multiple different fills will be the same type (I think, otherwise the ggplot won't run?)

I think there will be some different parameters to the scale_ function in the two different cases, so we do need the option.

I think we could also have the scale_fill and scale_colour options being disabled when there is no such fill/colour variable in the ggplot?

Happy to discuss.