ProjectMOSAIC / ggformula

Provides a formula interface to 'ggplot2' graphics.
Other
39 stars 11 forks source link

gf_errorbar with position=position_dodge(.9) not in the middle #120

Closed EricSHo closed 5 years ago

EricSHo commented 5 years ago

Hi,

Could someone help why the errorbars did not position in the middle? I have tried both position_dodge(.9) and position_dodge2(.9).

sex.cesd.homeless = HELPrct %>% group_by(sex, homeless) %>% summarise(cesd.sd = sd(cesd), cesd.mean = mean(cesd))

head(sex.cesd.homeless)

gf_bar( cesd.mean ~ homeless, fill=~sex, stat = 'identity', position='dodge', data = sex.cesd.homeless) %>% gf_errorbar((cesd.mean-cesd.sd) + (cesd.mean+cesd.sd) ~ homeless, width=0.2, position = position_dodge2(.9)) %>% gf_labs(y='Average CESD')

position_dodge2 position_dodge

Thanks a lot.

Eric.

EricSHo commented 5 years ago

Forgot to say, it worked if I coded in ggplot2. But I'm using ggformula for the class so I want to stick with ggformula. Here's the ggplot2 equivalence

ggplot(sex.cesd.homeless, aes(x=homeless, y=cesd.mean, fill=sex)) + geom_bar(position='dodge', stat='identity') + geom_errorbar(aes(ymin=cesd.mean-cesd.sd, ymax=cesd.mean+cesd.sd), position = position_dodge(.9), width=.2)

ggplot2

rpruim commented 5 years ago

The issue is the that group aesthetic is not being inherited (by default) in gf_errorbar(). I'm trying to remember if there is a reason for that default. If not, it is an easy thing to change it from FALSE to TRUE.

In the meantime, there are two ways to get what you want using the current version: set inherit = TRUE or set group = ~ sex in the call to gf_errorbar().

As a bonus, I'll demonstrate how to use df_stats() to create the summary.

library(mosaicData)
library(ggformula)
#> Loading required package: ggplot2
#> Loading required package: ggstance
#> 
#> Attaching package: 'ggstance'
#> The following objects are masked from 'package:ggplot2':
#> 
#>     geom_errorbarh, GeomErrorbarh
#> 
#> New to ggformula?  Try the tutorials: 
#>  learnr::run_tutorial("introduction", package = "ggformula")
#>  learnr::run_tutorial("refining", package = "ggformula")
sex.cesd.homeless <-
  df_stats(cesd ~ sex + homeless, data = HELPrct, mean, sd)
sex.cesd.homeless
#>      sex homeless mean_cesd  sd_cesd
#> 1 female homeless  38.42500 11.86654
#> 2   male homeless  32.98225 12.22530
#> 3 female   housed  35.97015 13.66257
#> 4   male   housed  30.27684 11.86990

p1 <-
  gf_col(mean_cesd ~ homeless, fill = ~ sex, position = position_dodge(), group = ~ sex,
       data = sex.cesd.homeless) %>%
  gf_labs(y='Average CESD')

# fix #1: inherit = TRUE
p1 %>%
  gf_errorbar((mean_cesd - sd_cesd) + (mean_cesd + sd_cesd) ~ homeless, 
              data = sex.cesd.homeless,
              width = 0.2 , position = position_dodge(0.9), inherit = TRUE) 


# fix #2: group = ~ sex
p1 %>%
  gf_errorbar((mean_cesd - sd_cesd) + (mean_cesd + sd_cesd) ~ homeless, 
              data = sex.cesd.homeless,
              width = 0.2 , position = position_dodge(0.9), group = ~ sex) 

Created on 2019-03-09 by the reprex package (v0.2.1)

rpruim commented 5 years ago

Fortunately, I left a comment trail... Looks like I can fix this and change inherit back to TRUE by default.

The issue is that in an older version of ggplot2, the aesthetics for geom_errorbar() were named so that they caused problems if we inherited from previous layers. I believe this has been fixed (and is reflected in the rest of the code for gf_errorbar()), but the default for inherit wasn't changed back to TRUE and there is some cruft in the documentation related to the situation from before ggplot2 was updated on CRAN.

Unless I discover problems, I plan to make inherit = TRUE the default in the next release.

EricSHo commented 5 years ago

Thank you Randy.

Three stupid questions:

  1. df_stats is great. But why is it called df_stats instead of gf_stats?
  2. gf_errorbar(... group = ~sex). Wouldn't it more consistent to be gf_errorbar(... fill = ~sex)?
  3. Other than mosaic student guides, this forum, any resources can help to learn ggformula in a more systematic way?

Eric.

rpruim commented 5 years ago
  1. Because it produces data, not a graph

  2. It is group that matters, and the error bars don't have a fill aesthetic. The group aesthetic sometimes gets set automatically to do what you want (but not always, sometimes you have to set it manually, so it's good to know about.)

  3. Three answers

    a. Start here: https://projectmosaic.github.io/ggformula/. What we produce will likely be findable there. We should probably add links to things other people have done, so if you find something useful, let us know. (Opening an issue is probably the best way to make sure it gets on the radar.)

    b. Hmm. I'm not sure we have links of that page to some of the text books that we have produced accompanying documents for. You can find those at https://projectmosaic.github.io/mosaic/articles/mosaic-resources.html#textbook-related until I get the ggformula web site updated with links. The newer ones use ggformula. Older ones use mosaic and lattice.

    c. If we produced some additional things, what would you like to see? (At some point we should redo the "Start Teaching Statistics with R" little book -- probably using bookdown -- updated to use ggformula. Tha's already on the wish list.)

One more minor comment. In both ggplot2 and ggformula you can use geom_col() or gf_col() instead of using stat = "identity" with the bar geom. I guess is a matter of taste whether this is preferable, but if you don't want to introduce the idea of stats, you can avoid it.

rpruim commented 5 years ago

PS. Have you seen the two tutorials that come with the package? There is a message about them each time you load the ggformula package.

rpruim commented 5 years ago

I've added https://projectmosaic.github.io/ggformula/articles/learn-more.html with links to external resources that use ggformula.

EricSHo commented 5 years ago

Do you know what does col of gf_col/geom_col stand for?

rpruim commented 5 years ago

@EricSHo, the name gf_col() is inherited from ggplot2::gf_col(). I'm guessing it stands for "column", but I don't know that with certainty.