ProjectMOSAIC / ggformula

Provides a formula interface to 'ggplot2' graphics.
Other
39 stars 11 forks source link

`gf_counts` with an integer variable uses decimals on x-scale #111

Closed lahvak closed 5 years ago

lahvak commented 6 years ago

When creating a bar chart of a discrete numerical variable using gf_count, the scale on the x-axis has ticks and labels at decimal values, even though the values of the variable are all integers.

I understand that in many cases, integer values actually represent a continuous variable. In such cases, though, it would in my opinion make little sense to use a count based bar chart to display the data: a histogram with bin width of 1 would be more proper. I believe that a bar chart should by default have a discrete scale on the x-axis.

It is of course possible to convert the integer variable to a factor, specifying levels as min:max of the variable and set drop = FALSE for the discrete x-scale, it is however a lot of work to get a result that should be the default. The ggformula package is often used in introductory statistics courses, and my experience is that the current behavior is confusing for students in such courses.

rpruim commented 5 years ago

Finally getting back to this. You didn't provide an example, so I tried creating one, and the only instance I came up with quickly was when the data consisted of two consecutive integers. Are you seeing this more often?

Another option for this sort of plot is to use a histogram with binwidth = 1. That's my usual way of plotting integer data.

library(ggformula)
gf_histogram(~ rbinom(100, 15, 0.4), binwidth = 1)

Created on 2019-01-09 by the reprex package (v0.2.1)

rpruim commented 5 years ago

Before closing this, I decided to add a little function that can assist in setting the breaks used for a plot. Currently the name is discrete_breaks(). A resolution can be set to give breaks that are multiples of the resolution and roughly spanning the range of the data. The default resolution is 1, which provides breaks at each integer.

suppressPackageStartupMessages(library(ggformula))
x <- rbinom(100, 100, 0.4)
p <- gf_bar( ~ x)
p %>% gf_refine(scale_x_continuous(breaks = discrete_breaks()))

p %>% gf_refine(scale_x_continuous(breaks = discrete_breaks(5)))

p %>% gf_refine(scale_x_continuous(breaks = discrete_breaks(2)))

Created on 2019-06-19 by the reprex package (v0.3.0)