gge-ucd / Discussion

Class discussion for R-DAVIS course
0 stars 4 forks source link

using geom_line and color = sex doesn't actually color the lines by sex #43

Closed mkclapp closed 5 years ago

mkclapp commented 6 years ago

Has anyone else noticed that when they're making the plot for the w7 assignment, the resulting plot doesn't actually assign different colors by sex?

ggplot(data = yearly_sex_counts, aes(x = year, y = n, color = sex)) + geom_line() + facet_wrap(~species_id)

results in the plot that we're targeting, but all the bars are pink, making it seem like there are only females in the dataset. image

whereas using geom_point actually colors the measurements by sex:

ggplot(data = yearly_sex_counts, aes(x = year, y = n, color = sex)) + geom_point(alpha = 0.5) + facet_wrap(~species_id) + theme_bw()
image

i tried moving the color = sex into an aes() inside geom_line() instead of the main ggplot(), but that didn't work. and i tried messing with the transparency just in case all the blue values were hiding behind the pink ones, but that was not the case.

por que? @ryanpeek @mikoontz @MarthaWohlfeil @MCMaurer ?

mikoontz commented 6 years ago

Hmm, good question. The plot on the website even seems a bit off (more like the one you have on the top than one that properly colors two separate lines): https://gge-ucd.github.io/R-DAVIS/lesson_import_dplyr_ggplot2.html#visualize_the_data

When you run packageVersion("ggplot2"), what do you get?

I get

> packageVersion("ggplot2")
[1] ‘2.2.1.9000’
MCMaurer commented 6 years ago

Just looking at your two plots, it seems like what your geom_line is doing is connecting the dots between male and female points for each year on each plot. I don't think this is the plot you're shooting for, are you looking to do a stacked bar plot instead? If so, I think geom_col would do the trick.

mkclapp commented 6 years ago

[1] ‘2.2.1.9000’` as well.

Yes, I agree... I wasn't sure what geom_line() is doing there, but I think @MCMaurer's guess is probably right.

ryanpeek commented 6 years ago

Hi @mkclapp, Good question! Let's chat more about this in class...I used geom_col in my w7 assignment as geom_line can do some weird things when grouping / mapping across categories (like sex, year, etc). In this case we are trying to map geom_line to a single value for each year, so I think it's drawing a line from the F to the M point for each year and species, but using the first value as the color (because F comes before M). So generally we don't want to use geom_line for categorical data , unless we are mapping a geom_line for each sex through time (and not wrapping by species).

May not have done a great job explaining this, but we'll talk more tomorrow.

mikoontz commented 6 years ago

Hmm. When we specify color = sex, it should be doing the grouping implicitly. I think there's something more insidious here. @mkclapp, when I clone your whole repository and knit your document, I get the plot I expect (with a different-colored line through time for each sex).

In your funky_plot chunk, just after the ggplot() function call that gets you the weird looking plot, can you add the line sessionInfo()?

This will print all the attached packages and their versions (as well as your R version and RStudio version. I'm trying to figure out what is different about our computers such that running the same code results in two different plots!

Here's a screenshot of the knitted document I get using your code (and then adding the sessionInfo() line. The plot looks like I'd expect (and different from yours). How does the session info compare?

screen shot 2018-02-27 at 7 49 59 am

mikoontz commented 6 years ago

@ryanpeek, since your computer's been the one doing the website knitting and the plot here looks goofy, I assume you're getting a funny looking plot from @mkclapp's code too? Can you clone my repository and see what happens when you knit my week 7 assignment (which produces the right result on my machine)?

ryanpeek commented 6 years ago

@mikoontz Will do! Could be a version difference...looks like you have 3.4.2, and I'm using 3.4.3. Here's my session info:

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] forcats_0.3.0      stringr_1.3.0      dplyr_0.7.4       
[4] purrr_0.2.4        readr_1.1.1        tidyr_0.8.0       
[7] tibble_1.4.2       ggplot2_2.2.1.9000 tidyverse_1.2.1   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15      cellranger_1.1.0  pillar_1.2.0     
 [4] compiler_3.4.3    plyr_1.8.4        bindr_0.1        
 [7] tools_3.4.3       lubridate_1.7.2   jsonlite_1.5     
[10] nlme_3.1-131.1    gtable_0.2.0      lattice_0.20-35  
[13] pkgconfig_2.0.1   rlang_0.2.0       psych_1.7.8      
[16] cli_1.0.0         rstudioapi_0.7    yaml_2.1.16      
[19] parallel_3.4.3    haven_1.1.1       bindrcpp_0.2     
[22] withr_2.1.1.9000  xml2_1.2.0        httr_1.3.1       
[25] hms_0.4.1         grid_3.4.3        glue_1.2.0       
[28] R6_2.2.2          readxl_1.0.0      foreign_0.8-69   
[31] modelr_0.1.1      reshape2_1.4.3    magrittr_1.5     
[34] scales_0.5.0.9000 rvest_0.3.2       assertthat_0.2.0 
[37] mnormt_1.5-5      colorspace_1.3-2  stringi_1.1.6    
[40] lazyeval_0.2.1    munsell_0.4.3     broom_0.4.3      
[43] crayon_1.3.4     
ryanpeek commented 6 years ago

@mkclapp @mikoontz...Welp, looks like this is something related to the versions...I'm definitely getting the same weird line plot shown on the website / what @mkclapp showed. I'm not clear on what specifically it is, or why the behavior is different when you move the data= and aes() components out of the main ggplot call and into the geom_line call. Seems this may be something we need to ask Hadley or someone familiar with the inner workings of ggplot?

data= Inside ggplot():

And running either of these pieces of code:

ggplot(data=yearly_sex_counts, aes(x = year, y = n, color = sex)) +  geom_line()

ggplot(data=yearly_sex_counts) +
    geom_line(aes(x = year, y = n, color = sex))

And you get these plots:

screen shot 2018-02-27 at 8 27 50 am screen shot 2018-02-27 at 8 27 37 am

data= inside geom_line()

Running this code:

ggplot() +
    +  geom_line(data = yearly_sex_counts, aes(x = year, y = n, color = sex)) +
    +  facet_wrap(~ species_id)

Yields

screen shot 2018-02-27 at 8 36 29 am
MCMaurer commented 6 years ago

I'm getting the regular line plots running with 3.3.3, just for more evidence towards a version-specific issue.

However, I reproduced the funky plot with the following code:

ggplot(data = yearly_sex_counts, 
       aes(x = year, y = n, color=sex, group=year)) +
  geom_line() +
  facet_wrap(~species_id, nrow = 5) +
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

I think that what's happening is ggplot in your version of R is somehow automatically using year as the grouping even when you pass color=sex , whereas when you put the data all into geom_line, it properly recognizes that you're using color=sex and sets group=sex automatically.

Note that if you change group=year to group=sex in my example, it creates the proper non-funky line plot.

mikoontz commented 6 years ago

Okay, I updated R to 3.4.3 and still can't produce the wrong plot.

I run this:

screen shot 2018-02-27 at 9 33 15 am

with this as my session info:

screen shot 2018-02-27 at 9 41 24 am

In this minimum example:

https://github.com/gge-ucd/learning-tidyverse-mikoontz/blob/master/docs/ggplot_weirdness_reprex_mikoontz.Rmd

mikoontz commented 6 years ago

Can @ryanpeek @MCMaurer and (if you're up for it) @mkclapp save the Rmd file I linked to, knit it, and see what happens?

It's a more minimal example (without the extra assignment workflow), so I'm hoping it will point more directly to the problem.

MCMaurer commented 6 years ago

Just did that on a fresh session, running 3.3.3, my plot comes out just fine. I'm also running ggplot2_2.2.1, perhaps the ggplot version is driving this?

mikoontz commented 6 years ago

The ggplot2 versions look the same to me (2.2.1.9000). That's what Mary said she was running also. I just noticed my tidyverse version is different. Going to try that next.

Edit: nope that didn't break it.

Trying the different forcats versions now. (I have an old one)

mkclapp commented 6 years ago

Eep I can't keep up!

screen shot 2018-02-27 at 9 09 20 am

I'm running 3.4.3 as well. ` @mikoontz, when I knit the minimal document you linked to, I still get the weird plot: screen shot 2018-02-27 at 9 45 06 am

MCMaurer commented 6 years ago

It's definitely the ggplot version. I just installed ggplot from github with devtools and knit @mikoontz .rmd and I got the funky plot. When I was running ggplot2_2.2.1, it was fine, but now with 2.2.1.9000, your .rmd makes the funky plot.

mikoontz commented 6 years ago

But... sessionInfo says we have the same ggplot2 version. How can it be the ggplot2 version if we are using the same one? Is the GitHub version called the same thing as the CRAN version?

MCMaurer commented 6 years ago

I think 2.2.1 must be the last stable build whereas 2.2.1.9000 is the latest dev version that has the funkiness.

mikoontz commented 6 years ago

Right, and I'm using 2.2.1.9000 and don't get funkiness!

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] forcats_0.3.0      stringr_1.3.0      dplyr_0.7.4       
[4] purrr_0.2.4        readr_1.1.1        tidyr_0.8.0       
[7] tibble_1.4.2       ggplot2_2.2.1.9000 tidyverse_1.2.1   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15      cellranger_1.1.0  pillar_1.1.0     
 [4] compiler_3.4.3    plyr_1.8.4        bindr_0.1        
 [7] tools_3.4.3       digest_0.6.15     lubridate_1.7.1  
[10] jsonlite_1.5      evaluate_0.10.1   gtable_0.2.0     
[13] nlme_3.1-131      lattice_0.20-35   pkgconfig_2.0.1  
[16] rlang_0.1.6       psych_1.7.5       cli_1.0.0        
[19] rstudioapi_0.7    yaml_2.1.16       parallel_3.4.3   
[22] haven_1.1.0       bindrcpp_0.2      xml2_1.1.1       
[25] httr_1.3.1        knitr_1.20        hms_0.3          
[28] rprojroot_1.3-2   grid_3.4.3        glue_1.2.0       
[31] R6_2.2.2          readxl_1.0.0      foreign_0.8-69   
[34] rmarkdown_1.8.7   modelr_0.1.1      reshape2_1.4.2   
[37] magrittr_1.5      backports_1.1.2   scales_0.5.0.9000
[40] htmltools_0.3.6   rvest_0.3.2       rsconnect_0.8.5  
[43] assertthat_0.2.0  mnormt_1.5-5      colorspace_1.3-2 
[46] stringi_1.1.6     lazyeval_0.2.1    munsell_0.4.3    
[49] broom_0.4.2       crayon_1.3.4    
mikoontz commented 6 years ago

Thanks @mkclapp! That's so strange. My next step was going to be to look at the operating system differences between me and Ryan, but you and I have the same one!

MCMaurer commented 6 years ago

Shoot, I can't keep up with all the different sessionInfo()s!

Well, I changed absolutely nothing except I installed ggplot2 from GitHub, and when I ran your .rmd again it went funky.

Seems like @mkclapp and I are both running tidyverse 1.2.1 and you're using 1.1.1, I wonder if that's somehow interacting with ggplot?

mikoontz commented 6 years ago

Nope, just updated that. I'm running tidyverse 1.2.1 (see most recent comment above)

MCMaurer commented 6 years ago

Wanna try running sessionInfo() after you've loaded tidyverse?

Here's what I get from that:

## R version 3.3.3 (2017-03-06)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.3
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] forcats_0.2.0      stringr_1.2.0      dplyr_0.7.4       
## [4] purrr_0.2.4        readr_1.1.1        tidyr_0.8.0       
## [7] tibble_1.4.2       ggplot2_2.2.1.9000 tidyverse_1.2.1   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.15      cellranger_1.1.0  pillar_1.1.0     
##  [4] plyr_1.8.4        bindr_0.1         tools_3.3.3      
##  [7] digest_0.6.15     lubridate_1.7.2   jsonlite_1.5     
## [10] evaluate_0.10.1   nlme_3.1-131      gtable_0.2.0     
## [13] lattice_0.20-35   pkgconfig_2.0.1   rlang_0.2.0.9000 
## [16] psych_1.7.8       cli_1.0.0         rstudioapi_0.7   
## [19] parallel_3.3.3    haven_1.1.1       bindrcpp_0.2     
## [22] withr_2.1.1.9000  xml2_1.2.0        httr_1.3.1       
## [25] knitr_1.19        hms_0.4.1         rprojroot_1.3-2  
## [28] grid_3.3.3        glue_1.2.0        R6_2.2.2         
## [31] readxl_1.0.0      foreign_0.8-69    rmarkdown_1.8    
## [34] modelr_0.1.1      reshape2_1.4.3    magrittr_1.5     
## [37] backports_1.1.2   scales_0.5.0.9000 htmltools_0.3.6  
## [40] rvest_0.3.2       assertthat_0.2.0  mnormt_1.5-5     
## [43] colorspace_1.3-2  stringi_1.1.6     lazyeval_0.2.1   
## [46] munsell_0.4.3     broom_0.4.3       crayon_1.3.4

I noticed me and @mkclapp are running a different version of rlang from you, @mikoontz , and ours is a .9000 version

mikoontz commented 6 years ago

Yes, that's what I've done. Loading the tidyverse first, then putting sessionInfo.

Differences I see: My R version is newer (but that doesn't seem to be it) My macOS version is newer (but that doesn't seem to be it, since Mary and I are using the same version) My forcats version is newer (but I just updated that; it's not that) My stringr version is newer

I load via namespace compiler_3.4.3 (not sure why) My tools_3.4.3 is newer My lubridate is older My rlang is older (I'll try this next, as maybe it's a non-standard evaluation thing) My psych is older My haven is older My xml2 is older My knitr is newer (can you produce a bad plot outside of a knitted document?) My hms is older

anyway... some differences. I made the example more minimal by just using library(ggplot2) and still can't break the plot: https://github.com/gge-ucd/learning-tidyverse-mikoontz/blob/master/docs/ggplot_weirdness_reprex_mikoontz.Rmd

mkclapp commented 6 years ago

I just updated all my packages (Tools > Check for Package Updates..) and also loaded sessionInfo after 'tidyverse'. Still got the wonky plot.

screen shot 2018-02-27 at 10 10 35 am

And I copied/pasted all the code to a regular .R script and also still got the wonky plot.

mikoontz commented 6 years ago

You are an excellent debugging partner. I didn't know about the Tools > Check for packages updates trick. Let me try that now.

What version of RStudio are you running? I'm on 1.1.414

mkclapp commented 6 years ago

:] I had no idea what I was starting, but I am curious to get to the bottom of it now!

(Also @MCMaurer side tidbit-- how did you copy/paste your sessionInfo directly into this issue thread? when I copied/pasted, this text box interpreted the ## as section headers.)

mikoontz commented 6 years ago

Yay!! I broke it!!

Previously, I had installed the regular CRAN version of ggplot2 (even though the development version 2.2.1.9000 was working). So I was using ggplot 2.2.1 for this next step.

First, I used your Tools > Check for Package Updates trick to update all my packages. I reran my code and still got the right plot.

Then I used the install_github("hadley/ggplot2") to get the 2.2.1.9000 development version of ggplot2 from GitHub.

Now the plot is broken!

So it's not ggplot2 development version per se, because that's what I was running before and it was still working. It appears to be how the ggplot2 development version plays with other packages.

I'll keep digging, but at least I can reproduce the funk.

mikoontz commented 6 years ago

Welp, these are the packages that updated when I looked for updates to everything:

install.packages(c("BH", "broom", "callr", "checkmate", "chron", "devtools", "DT", "gapminder", "geosphere", "git2r", "googleway", "gridExtra", "haven", "Hmisc", "hms", "htmlTable", "htmlwidgets", "httpuv", "irlba", "lme4", "MASS", "matrixStats", "mgcv", "mvtnorm", "nlme", "NLP", "pillar", "plotrix", "psych", "quantreg", "raster", "RcppEigen", "reprex", "reshape2", "reticulate", "rgl", "rpart", "rstan", "rstanarm", "Rttf2pt1", "shinyjs", "slam", "sp", "StanHeaders", "tfruns", "tidyselect", "tm", "topmodel", "units", "viridis", "viridisLite", "XML", "xml2", "zoo"))

So one of those doesn't play nicely with the development version of ggplot2.

Just to confirm here, if you run install.packages("ggplot2") to install the CRAN version of the ggplot2 package, does your plot look right?

MCMaurer commented 6 years ago

Alright, so it sounds like we had a similar thing, where installing from hadley's repo caused the issue.

One thing I noticed was that after installing from github, my scales, withr, and rlang packages were also .9000 dev versions. However, @mikoontz, it looked like when you weren't making the funky plot, but you were still running ggplot2_2.2.1.9000, you weren't running the dev versions of rlang and withr. My guess was that installing from github also installed these dependencies as dev versions, and this is what caused the funkiness.

And @mkclapp, when you copy/paste a big chunk of code, just put 3 of these ` in front of the code and 3 of them after it!

ryanpeek commented 6 years ago

Same for me...I definitely think it's some odd conflict with the dev version of ggplot2. I still get the funky plot and have the most recent version of R (and same operating system as @mikoontz).

My session info:

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] ggplot2_2.2.1.9000

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15      digest_0.6.15     assertthat_0.2.0 
 [4] dplyr_0.7.4       withr_2.1.1.9000  grid_3.4.3       
 [7] plyr_1.8.4        R6_2.2.2          gtable_0.2.0     
[10] magrittr_1.5      scales_0.5.0.9000 pillar_1.2.0     
[13] rlang_0.2.0       lazyeval_0.2.1    bindrcpp_0.2     
[16] labeling_0.3      tools_3.4.3       glue_1.2.0       
[19] munsell_0.4.3     yaml_2.1.16       compiler_3.4.3   
[22] pkgconfig_2.0.1   colorspace_1.3-2  knitr_1.20       
[25] bindr_0.1         tibble_1.4.2     
mikoontz commented 6 years ago

Ah ha, so maybe the development versions of withr or rlang are causing trouble!

I figured out more: there's only trouble if you pass a tibble to the ggplot that still has a lingering grouping associated with it. So this now works, even with the development versions of ggplot2, rlang, and withr.

ggplot(data = ungroup(yearly_sex_counts), aes(x = year, y = n, color = sex)) +
  geom_line() +
  facet_wrap( ~ species_id)

Producing:

image

With this session Info:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_2.2.1.9000 bindrcpp_0.2       dplyr_0.7.4        shiny_1.0.5       
[5] devtools_1.13.5   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15      bindr_0.1         pillar_1.2.1      plyr_1.8.4       
 [5] compiler_3.4.3    git2r_0.21.0      shinyjs_1.0       tools_3.4.3      
 [9] digest_0.6.15     tibble_1.4.2      jsonlite_1.5      evaluate_0.10.1  
[13] memoise_1.1.0     gtable_0.2.0      debugme_1.1.0     pkgconfig_2.0.1  
[17] rlang_0.2.0.9000  reprex_0.1.2      cli_1.0.0         rstudioapi_0.7   
[21] curl_3.1          yaml_2.1.16       withr_2.1.1.9000  httr_1.3.1       
[25] stringr_1.3.0     knitr_1.20        rprojroot_1.3-2   grid_3.4.3       
[29] glue_1.2.0        R6_2.2.2          rmarkdown_1.8.7   callr_2.0.2      
[33] magrittr_1.5      whisker_0.3-2     backports_1.1.2   scales_0.5.0.9000
[37] htmltools_0.3.6   rsconnect_0.8.5   assertthat_0.2.0  mime_0.5         
[41] xtable_1.8-2      colorspace_1.3-2  httpuv_1.3.6      labeling_0.3     
[45] utf8_1.1.3        stringi_1.1.6     miniUI_0.1.1      lazyeval_0.2.1   
[49] munsell_0.4.3     crayon_1.3.4     
ryanpeek commented 6 years ago

Huh! Interesting. I have found some issues relating to reading in tibbles and working with tibbles. I often convert to data.frames to avoid dealing with some of these sorts of things, but haven't seen this specific issue before.

Same issue is resolved if after you create your object, you convert to data.frame:

yearly_sex_counts <- yearly_sex_counts %>% as.data.frame()

Then this code works as is: ggplot(data=yearly_sex_counts, aes(x = year, y = n, color = sex)) + geom_line() + facet_wrap( ~ species_id)

mkclapp commented 6 years ago

Woah! @ryanpeek 's code coercing yearly_sex_counts into a dataframe worked for me to make the un-funky line plot.

So ... tibbles. I think I missed what a tibble was when I was gone the first few weeks of the quarter. Are you all saying that the tibble somehow retains the grouping from when we used pipes and group_by to subset the data from surveys_complete into yearly_sex_counts? So if we rearrange the way the data were grouped in the tibble (passing sex in as the first grouping variable), we should also get the proper plot...?

Answering my own question here... so when I run my original code (below):

yearly_sex_counts <- 
  surveys_complete %>% 
  group_by(year, species_id, sex) %>% 
  summarize(n = n())

And then make the following ggplot:

       aes(x = year, y = n, color = sex)) +
  geom_line() +
  facet_wrap(~species_id, nrow = 5) +
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

I get the funky plot.

But when start from scratch and group_by with sex first instead of year first (like so):

yearly_sex_counts <- 
  surveys_complete %>% 
  group_by(sex, species_id, year) %>% 
  summarize(n = n())

The same ggplot code above produces the "correct" line plot (separate lines for each sex, colored appropriately).

BAM. Thanks, everyone!

mikoontz commented 6 years ago

Whoa! That's interesting that the ordering in the group_by() matters.

You are right. A tibble is the tidyverse version of a dataframe. It consists of the rows and columns of a dataframe, plus some additional attributes that tell R some more information about it. Usually these attributes work behind the scenes to do things. The group_by() function does indeed "tag" the tibble with some additional information to tell R how to split it before applying some function (like your summzarize(n = n()) code.)

You can use the ungroup() function to strip the tibble of its grouping attributes, which is why converting to a dataframe works, as does running:

ggplot(data = ungroup(yearly_sex_counts), aes(x = year, y = n, color = sex)) +
  geom_line() +
  facet_wrap( ~ species_id)
mikoontz commented 6 years ago

Hey, @MCMaurer -- good thought on checking the development versions of those few packages whose development versions are also installed when installing the developer version of ggplot2 from github. Those still don't seem to be the ones that make a difference!

I install ggplot2 from GitHub (also getting the .9000 versions of scales, withr, and rlang). I run the code we've been running and get a bad plot. I then close out of R, open up a new session and run install.packages(c("scales", "withr", "rlang")) to keep the development version of ggplot2 but use the CRAN versions of scales, withr, andrlang. I still get a weird plot then. Must be another package that I had updated during my Tools > Check Package Updates run.

mkclapp commented 6 years ago

Ok, cool. And one last question before I think we can consider this case closed?

I'm assuming that's also why moving the data = and aes() into the geom_line also produced the proper plot-- because...? giving data to the geom_line() instead of the big ole storage box of ggplot() somehow strips that residual grouping away as well? This is where I realize I don't actually understand what geoms do.

MCMaurer commented 6 years ago

@mikoontz whoa.... well if it's not either of those 3 packages, it's gotta be some other ggplot2 dependency, because I went from normal to funky after doing only the ggplot2 install from github.

EDIT Since it didn't seem to be either of those 3 dev-version packages, I checked through ggplot2's other imports and then checked the differences between your early versions (where you get the normal plot) and mine (where I got the funky plot). Here's what I got:

Mine: grid_3.3.3, Mike's: grid_3.4.2 Mine: reshape2_1.4.3, Mike's: reshape2_1.4.2

Those seem to be the only differences, other than scales, withr, and rlang, which you said didn't cause the issue.

@mkclapp wow, that's wild that the order in group_by makes the difference, I have no idea how a tibble stores prior groupings...

As for moving the data into geom_line, this is a handwavy answer, but I think it's just that different geoms have different ways they change your data to make it work for them. I think putting data in ggplot allows the underlying grouping to carry through and override everything else even though you said color=sex, whereas putting it into geom_line allows geom_line to look at color=sex and then decide to override the grouping the tibble already had hidden.

(on a related note, in addition to changing the order of group_by, moving data to geom_line, or changing from tibble to data.frame, my earlier answer of keeping data in ggplot and adding group=sex still works too)